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field of human genetics. Specifically the present inven- 
tion relates to methods and materials used to isolate and 
detect a human breast and ovarian cancer predisposing 
gene (BRCA1 ), some mutant alleles of which cause sus- 
ceptibility to cancer, in particular breast and ovarian can- 
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nosis of predisposition to breast and ovarian cancer The 
present invention further relates to somatic mutations in 
the BRCAl gene in human breast and ovarian cancer 
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BRCAl gene for mutations, which are useful for diag- 
nosing the predisposition to breast and ovarian cancer 


P'mtec by Jouve 


>5an!Oer<s 75CQt 3 APIS 


EP 0 705 902 A1 


Description 

7h o oresent mvent'on rotates generally to too f told of human genetics Specifically tne oresent invention relates to 
methods and materials used to sotate and detect a human breast and ovarian cancer predisposing gene i. BRCAI) 

5 some mutant alleles of which cause susceptibility to cancer in particular breast and ovarian cancer More specifically 
the .nvert on relates to germime mutations in the BRCAI gene ana their use in the diagnosis of preaisposition to oreast 
and ovanan cancer The present invention further relates to somatic mutations m the BRCA1 gene in human breast and 
ovarian cancer and their use m the diagnosis and prognosis of human breast and ovanan cancer Additionally the 
invention relates to somatic mutations in the BRCAt gene in other human cancers ana their use in the diagnosis and 
1 o prognosis of human cancers The invention also relates to the therapy of human cancers which have a mutation in the 

BRCAI gene including gene therapy, protein replacement therapy and protein mimetics Tne invention further relates 
to the screening of drugs for cancer therapy. Finally the invention relates to the screening of the BRCAI gene for 
mutations which are useful for diagnosing the predisposition to breast and ovarian cancer 

The publications and other materials used herein to illuminate the background of the invention, and in particular. 
is cases to provide additional details respecting the practice are incorporated herein by reference, and for convenience 
are referenced by author and date in the following text and respectively grouped in the appended List of References 
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The genetics of cancer is complicated, involving multiple dominant, positive regulators of the transformed state 
(oncogenes) as well as multiple recessive, negative regulators (tumor suppressor genes) Over one hundred oncogenes 
have been characterized Fewer than a dozen tumor suppressor genes have been identified, but the number is expected 
to increase beyond fifty (Knudson 1993) 

The involvement of so many genes underscores the complexity of the growth control mechanisms that operate in 
cells to maintain the integrity of normal tissue. This complexity is manifest in another way. So far no single gene has 
been shown to participate in the development of all or even the majority of human cancers. The most common oncogenic 
mutations are in the H-ras gene, found in 1 0-15% of all solid tumors (Anderson et al . , 1 992). The most frequently mutated 
tumor suppressor genes are the TP53 gene, homozygously deleted in roughly 50% of all tumors, and CDKN2. which 
was homozygously deleted in 46% of tumor cell lines examined (Kamb er al. 1994) Without a target that is common 
to ail transformed cells, the dream of a "magic bullet* that can destroy or revert cancer ceils while leaving normal tissue 
unharmed is improbable. The hope for a new generation of specifically targeted antitumor drugs may rest on the ability 
to identify tumor suppressor genes or oncogenes that play general roles in control of ceil division 

The tumor suppressor genes which have been cloned and characterized influence susceptibility to: 1) Retinoblas- 
toma (RBI ) 2) Wilms 1 tumor (WT 1 ): 3) Li-Fraumem (TP53): 4) Familial adenomatous polyposis (APC) 5) Neurofibroma- 
tosis type 1 (NF1); 6) Neurofibromatosis type 2 (NF2); 7) von Hippel-Lindau syndrome (VHL); 8) Multiple endocrine 


neoplasia type 2A (MEN2A). and 9) Melanoma (CDKN2) 

Tumor suppressor loci that have been mapped genetically but not yet isolated include genes for Multiple endocrine 
neoplasia type 1 (MEN1): Lynch cancer family syndrome 2 (LCFS2), Neuroblastoma (NB): Basal cell nevus syndrome 
(BCNS); Beckwith-Wiedemann syndrome (BWS) Renal cell carcinoma (RCC); Tuberous sclerosis 1 (TSC1); and Tu- 
berous sclerosis 2 (TSC2). The tumor suppressor genes that have been characterized to date encode products with 
similarities to a variety of protein types, including DNA binding proteins (WT 1 ), ancillary transcription-regulators (RBI ), 
GTPase activating proteins or GAPs (NF1), cytoskeletal components (NF2), membrane bound receptor kinases 
(MEN2A). cell cycle regulators (CDKN2) and others with no obvious similarity to known proteins (APC and VHL) 

In many cases, the tumor suppressor gene originally identified through genetic studies has been shown to be lost 
or mutated in some sporadic tumors This result suggests that regions of chromosomal aberration may signify the position 


of important tumor suppressor genes involved both in genetic predisposition to cancer and in sporadic cancer. 

One of the hallmarks of several tumor suppressor genes characterized to date is that they are deleted at high 
frequency in certain tumor types. The deletions often involves loss of a single allele, a so-called loss of heterozygosity 
(LOH) but may also involve homozygous deletion of both alleles For LOH. the remaining allele is presumed to be 
nonfucntional. either because of a preexisting inherited mutation, or because of a secondary sporadic mutation 

Breast cancer is one of the most significant diseases that affects women At the current rate, American women have 
a 1 in 6 risk of developing breast cancer by age 95 (American Cancer Society, 1992) Treatment of breast cancer at 
later stages is often futile and disfiguring, making early detection a high priority in medical management of the disease 
Ovarian cancer, although less frequent than breast cancer is often rapidly fatal and is the fourth most common cause 
of cancer mortality in American women Genetic factors contribute to an ill-defined proportion of breast cancer incidence, 
estimated to be about 5% of all cases but approximately 25% of cases diagnosed before age 40 (Claus et al 1 991 ) 
Breast cancer has been subdivided into two types early-age onset and late-age onset, based on an inflection in the 


age-specific incidence curve around age 50. Mutation of one gene. BRCAI, is thought to account for approximately 
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45° 0 cf faminal breast career out at least 50% of famines with both oreast and ovarian cancer (Easton et ai : 993 1 
intense efforts to isolate *he BRCAI gene nave proceeded since t was first mapped m 1 990 (Hall et at 1990 Narod 
of al 1991) A second locus. 8PCA2 has recertly been mapped to chromosome I3q (Wooster c?f a/. 1994) and 
appears to account for a proportion of early-onset oreast cancer roughly equal to BRCAI but confers a lower risk of 
5 ovarian cancer The remaining susceptibility to early-onset oreast cancer >s divided oetween as yet unmapped genes 
for familial cancer and rarer germlme mutations in genes such as TP53 (Malkin ef al . 1 990) It has aiso been suggested 
that hetorozygete carriers for defective forms of the Ataxia-Telangectasia gene are at higher nsk for breast cancer ( Swift 
et al 1976 Swift et al 1991) Late-age onset breast cancer is also often familial although the msks m relatives are not 
as hign as those for early-onset breast cancer (Cannon-Albright et al 1594 Mettlin ct al 1990) However, the percent- 
"0 age of such cases due to genetic susceptibility is unknown 

Breast cancer has long been reccgni7ed to be. in part, a familial disease (Anderson. 1 972) Numerous investigators 
have examined the evidence for genetic inheritance and concluded that the data are most consistent with dominant 
inheritance for a major susceptibility locus or loci (Bishop and Gardner 1 980 Go ef al 1 983 Wiliams and Anderson. 
1984. Bishop dal.. 1986. Newman et ai . 1988 Claus et al. 1991) Recent results demonstrate that at least three loci 
is exist which convey susceptibility to breast cancer as well as other cancers These loci are the TP53 locus on chromosome 
17p (Malkin et al.. 1990), a 17q-linked susceptibility locus known as BRCA1 (Hall et al., 1990), and one or more loci 
responsible for the unmapped residual. Hall et al. (1990) indicated that the inherited breast cancer susceptibility in 
kindreds with early age onset is linked to chromosome 1 7q21 : although subsequent studies by this group using a more 
appropriate genetic model partially refuted the limitation to early onset breast cancer (Margaritte ef al . 1992) 

20 Most strategies for cloning the 1 7q-linked breast cancer predisposing gene (BRCA1 ) require precise genetic local- 

ization studies The simplest model for the functional role of BRCA1 holds that alleles of BRCA1 that predispose to 
cancer are recessive to wild type alleles that is, cells that contain at least one wild type BRCAI allele are not cancerous 
However, cells that contain one wild type BRCAI allele and one predisposing allele may occasionally suffer loss of the 
wild type allele either by random mutation or by chromosome loss during cell division (nondisjunction) All the progeny 
2S of such a mutant cell lack the wild type function of BRCAI and may develop into tumors. According to this model, 
predisposing alleles of BRCAI are recessive, yet susceptibility to cancer is inherited in a dominant fashion women who 
possess one predisposing allele (and one wild type allele) risk developing cancer, because their mammary epithelial 
cells may spontaneously lose the wild type BRCAI allele This model applies to a group of cancer susceptibility loci 
known as tumor suppressors or antioncogenes, a class of genes that includes the retinoblastoma gene and neurofi- 
30 bromatosis gene By inference this model may also explain the BRCAI function, as has recently been suggested (Smith 
efa/.. 1992). 

A second possibility is that BRCAI predisposing alleles are truly dominant that is a wild type allele of BRCAI 
cannot overcome the tumor forming role of the predisposing allele. Thus, a cell that carries both wild type and mutant 
aleles would not necessarily lose the wild type copy of BRCAI before giving rise to malignant cells Instead, mammary 
35 cells in predisposed individuals would undergo some other stochastic change(s) leading to cancer 

If BRCAI predisposing alleles are recessive, the BRCAI gene is expected to be expressed in normal mammary 
tissue but not functionally expressed in mammary tumors In contrast, if BRCAI predisposing alleles are dominant, the 
wild type BRCAI gene may or may not be expressed in normal mammary tissue However, the predisposing allele will 
likely be expressed in breast tumor cells 

40 The 17q linkage of BRCAI was independently confirmed in three of five kindreds with both breast cancer and ovarian 

cancer (Narod et al . , 1991) These studies claimed to localize the gene within a very large region, 15 centiMorgans 
(cM) : or approximately 1 5 million base pairs, to either side of the linked marker pCMM86 (D17S74) However attempts 
to define the region further by genetic studies, using markers surrounding pCMMS6. proved unsuccessful Subsequent 
studies indicated that the gene was considerably more proximal (Easton et a ! ., 1993) and that the original analysis was 
45 flawed (Margaritte et al , 1 992). Hall et al . , (1 992) recently localized the BRCAI gene to an approximately 8 cM interval 
(approximately 6 million base pairs) bounded by Mfd15 (D17S250) on the proximal side and the human GIP gene on 
the distal side A slightly narrower interval for the BRCAI locus, based on publicly available data, was agreed upon at 
the Chromosome 1 7 workshop in March of 1 992 (Fain, 1 992). The size of these regions and the uncertainty associated 
with them has made it exceedingly difficult to design and implement physical mapping and/or cloning strategies for 
so isolating the BRCAI gene. 

Identification of a breast cancer susceptibility locus would permit the early detection of susceptible individuals and 
greatly increase our ability to understand the initial steps which lead to cancer As susceptibility loci are often altered 
during tumor progression, cloning these genes could also be important in the development of better diagnostic and 
prognostic products as well as better cancer therapies 
55 

SUMMARY OF THE INVENTION 


The present invention relates generally to the field of human genetics Specifically, the present invention relates to 
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"ictncds ana mater als used to isolate and detect a human breast cancer predisposing gene BRCAl , some ailoics of 
, v n ch cause susceotibility to cancer m particular oreast and ovarian cancer More specifically the present invention 
r elates to germnne mutations n the BRCAl gene and their use m the diagnosis of pred-spcsition to breast and ovarian 
cancer The invention further relates to somatic mutations in the BRCAl gene m human breast cancer and their use «n 
5 the diagnosis and prognosis of numan breast and ovanan cancer Additionally the invention relates to somatic mutations 
:p me 3RC A1 gene in other human cancers and their use in the diagnosis and prognosis of human cancers Tne invention 
aiso mates to the therapy of human cancers which have a mutation in the BRCAl gene including gene therapy protein 
replacement therapy and protein mimetics The invention further relates to the screening of drugs for cancer therapy 
Finally the invention relates to the screening of the BRCAl gene for mutations which are useful for diagnosing the 
m predisposition to breast and ovarian cancer 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagram showing the order of loci neighboring BRCAl as determined by the chromosome 1 7 wornshop 
'5 Figure 1 is reproduced from Fain. 1992 

Figure 2 is a schematic map of YACs which define part of Mfd15-MfdtS8 region. 

Figure 3 is a schematic map of STSs. Pis and BACs in the BRCAl region 

Figure 4 is a schematic map of human chromosome 17 The pertinent region containing BRCAl is expanded to 
indicate the relative positions of two previously identified genes CA125 and RNU2 BRCAl spans the marker D17S855 
20 Figure 5 shows alignment of the BRCAl zinc-finger domain with 3 other zinc-finger domains that scored highest in 

a Smith-Waterman alignment RPT 1 encodes a protein that appears to be a negative regulator of the IL-2 receptor in 
mouse RIN1 encodes a DNA-binding protein that includes a RING-fmger motif related to the zinc-finger RFP1 encodes 
a putative transcription factor that is the N-terminal domain of the RET oncogene product The bottom line contains the 
C3HC4 consensus zinc-finger sequence showing the positions of cysteines and one histidine that form the zinc ion 
25 binding pocket. 

Figure 6 is a diagram of BRCAl mRNA showing the locations of introns and the variants of BRCAl mRNA produced 
by alternative splicing Intron locations are shown by dark triangles and the exons are numbered below the line repre- 
senting the cDNA The top cDNA is the composite used to generate the peptide sequence of BRCAl Alternative forms 
identified as cDNA clones- or hybrid selection clones are shown below 
30 Figure 7 shows the tissue expression pattern of BRCAl The blot was obtained from Clontech and contains RNA 

from the indicated tissues. Hybridization conditions were as recommended by the manufacturer using a probe consisting 
of nucleotide positions 3631 to 3930 of BRCAl Note that both breast and ovary are heterogeneous tissues and the 
percentage of relevant epithelial cells can be variable. Molecular weight standards are in kilobases 

Figure 8 is a diagram of the 5' untranslated region plus the beginning of the translated region of BRCAl showing 
35 the locations of introns and the variants of BRCAl mRNA produced by alternative splicing Intron locations are shown 
by broken dashed lines Six alternate splice forms are shown 

Figure 9A shows a nonsense mutation in Kindred 2082 P indicates the person originally screened, b and c are 
haplotype carriers a, d. e f and g do not carry the BRCAl haplotype The C to T mutation results in a stop codon and 
creates a site for the restriction enzyme Avrll PCR amplification products are cut with this enzyme The carriers are 
40 heterozygous for the site and therefore show three bands. Non-carriers remain uncut. 

Figure 9B shows a mutation and cosegregation analysis in BRCAl kindreds Carrier individuals are represented as 
filled circles and squares in the pedigree diagrams. Frameshift mutation in Kindred 1910. The first three lanes are control, 
noncarrier samples Lanes labeled 1 -3 contain sequences from carrier individuals. Lane 4 contains DNA from a kindred 
member who does not carry the BRCAl mutation The diamond is used to prevent identification of the kindred. The 
45 frameshift resulting from the additional C is apparent in lanes labeled 1 , 2. and 3 

Figure 9C shows a mutation and cosegregation analysis in BRCAl kindreds Carrier individuals are represented 
as filled circles and squares in the pedigree diagrams Inferred regulatory mutation in Kindred 2035 ASO analysis of 
carriers and noncarriers of 2 different polymorphisms (PM1 and PM7) which were examined for heterozygosity in the 
germline and compared to the heterozygosity of lymphocyte mRNA The top 2 rows of each panel contain PCR products 
cq amplified from genomic DNA and the bottom 2 rows contain PCR products amplified from cDNA "A" and M G" are the 
two alleles detected by the ASO The dark spots indicate that a particular allele is present in the sample The first three 
lanes of PM7 represent the three genotypes in the general population 

Figures 10A-10H show genomic sequence of BRCAl The lower case letters denote intron sequence while the 
upper case letters denote exon sequence Indefinite intervals within introns are designated with vwvwvwww Known 
55 polymorphic sites are shown as underlined and boldface type 


4 


EP 0 705 902 A1 


20 


25 


30 


35 


40 


45 


50 


55 


DETAILED DESCRIPTION CF THE INVENTION 

- n o present invention relates generally to the held ot human genetics Specially the presentment, on relates to 
methods and materials used to isolate and detect a human breast cancer predisposing gene 'BRCAl , some alleles o 
wh en cause susceptibility to cancer ,n particular breast and ovarian cancer More specifically the present invention 
relates to germlme mutafons ,n the BRCAl gene and their use ,n the diagnosis ot predisposition to breast anc 1 ovar' a n 
cancer The invention turlher relates to somatic mutations in the BRCAl gene ,n human breast cancer an , 
the diagnosis and prognosis ot human breast and ovarian cancer Additionally the invention relates to somatic mutatior s 
, n lno brcai none in other human cancers and their use m the diagnosis and prognosis of human cancers The mven ion 
a o reT.es to the tnerapy of human cancers which have a mutation ,n the BRCAl gene including gene therapy oro.cn 
ScTmen, therapy and protein mimetics The invention further relates to the screening ot drugs tor cancer therapy 
Finally the invention relates to the screening ot the BRCAl gene tor mutations which are useful for diagnosing me 

predisposition to bre • and ovarian cancer pr . . nr n f H 

The present invention provides an isolated polynucleotide comprising all, or a portion of the BRCAl ocus ° . 

mutated BRCAl locus preferably at least eight bases and no. more than about 100 kb in length. Such polynucleosis 
may be antisense polynucleotides. The present invention also provides a recombinant construct comprising s^ h an 
isolated polynucleobde for example, a recombinant construct suitable for expression in a transformed host “» 

Also provided by the present invention are methods ot detecting a polynucleotide comprising a portion ot B " CA1 
locus or its expression product in an analyte Such methods may further comprise the step of amplifying the portion o 
the BRCAl locus and may further include a step of providing a set ot polynucleotides which are primers for ampliticatio 
« OrIm locus meihod ,, us.lul lo, 0» 9 oo,„ o, m *«*»«*» » o, ,0. 

,«*«. “ d 
to an isolated polypeptide comprised of at least five ammo acid residues encoded by the BRCAl locus 

The present invention also provides kits for detecting ,n an analyte a ln a 

BRCAl locus, the k„s comprising a polynucleotide complementary to the portion of the BRCAl locus packaged 

"Te" o' P"Pa""9 a polynucleotide comprising polymerizing nucleotides 

to yield a sequence comprised of at least eight consecutive nucleotides of the BRCAl locus, and methods of P^an 9 
a polypeptide comprising polymerizing ammo acids to yield a sequence comprising at least five ammo acids 

W ' th The^ presem InSon further provides methods of screening the BRCAl gene to identify mutations Such methods 

“TJ So'm invention pio.ide. m.M. ol scr..n,n, d,ug, .o, canoe, therapy » ».»«, !*•** 

"Clr— X“"a « means n.ce..., >o, p.odcc,»n o, gan.-Pasad 
cancer ceNs These therapeutic agents may take the form of polynucleotides comprising all or a port on ot the BRCA 
Sus placii in appropriate vectors or delivered to targe, cells in more direct ways such that the function of the BRCA 
protein ,s reconstituted Therapeutic agents may also take the form of polypeptides based on e^ther a portion 
pntire Drotem seauence of BRCAl These may functionally replace the activity ot BRCAl in vivo. 

It is a discovery of the present invention that the BRCAl locus which predisposes individuals to breast cancer and 
ovarian cancer is a gene encoding a BRCAl protein, which has been found to have no significant homology wi now 
protein or DNA sequences This gene is termed BRCAl herein It is a discovery of the present invention that mu at, o 
m the BRCAl locus in the germlme are indicative of a predisposition to breast cancer and ovarian cancer Finally, 

o S pr s n invention that somatic mutations m the BRCAl locus are also associated w„ breas cancer. 
ovaTn Lncer and other cancers, which represents an indicator of these cancers or of the prognosis o. these cancers 
The mutational events ot the BRCAl locus can involve deletions insertions and point mutations withm the coding 

arm o. human chromosome 17 of the human genome. 17q. which has a size 
estimated at about S million base pairs, a region which contains a genetic locus, BRCAl which causes susceptib, ty 

cancer, including breast and ovarian cancer, has been identified 

The reaion containing the BRCAl locus was identified using a variety of genetic techniques 
Genetic mapping techniques initially defined the BRCAl region in terms of recombination with 9 ene,ic * 
Based upon studies of large extended families (-kindreds", with multiple cases of breast cancer (and ovarian 
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cases in some kmcreas) a chromosomal r egion nas been pinpointed that contains the 3RCA1 gene as well as other 
putative susceptibility alleles in the BRC A' locus Two meictic o'eakpomts nave oeen discovered on the d-stal side cl 
the BRCAi locus which are expressed as recombinants between genet-c makers and tne disease and one recombinant 
on the proximal side of the 5RCA1 iocus Thus a region which contains the 3RCA1 locus is physically bounded by 
5 these markers 

The use of the genetic markers provided by this invention allowed the identification of clones which cover the region 
from a numan yeast artificial chromosome (VAC) or a human bacterial artificial chromosome (BAC ) library it also allowed 
for the identification and preparation of more easily manipulated cosmid Pi and BAC clones from this region and the 
construction of a cortig from a subset of the clones These cosmids. p 1 s. VACs and BACs provide the basis for cloning 
io the BRCAI locus and provide the basis for developing reagents effective for example, in the diagnosis and treatment 

of breast and/or ovarian cancer The BRCAI gene and other potential susceptibility genes have been isolated from this 
region The isolation was done using software trapping (a computational method for identifying sequences likely to 
contain coding exons, from contiguous or discontinuous genomic DNA sequences), hybrid selection techniques and 
direct screening, with whole or partial cDNA inserts from cosmids Pi s and BACs. m the region to screen cDNA libraries 
is These methods were used to obtain sequences of loci expressed in breast and other tissue These candidate loci were 

analyzed to identify sequences which confer cancer susceptibility We have discovered that there are mutations in the 
coding sequence of the BRCAI locus in kindreds which are responsible for the 17q-lmked cancer susceptibility known 
as BRCAI This gene was not known to be in this region. The present invention not only facilitates the early detection 
of certain cancers, so vital to patient survival, but also permits the detection of susceptible individuals before they develop 
20 cancer 

Population Resources 

Large, well-documented Utah kindreds are especially important in providing good resources for human genetic 
25 studies Each large kindred independently provides the power to detect whether a BRCAI susceptibility allele is segre- 
gating in that family Recombinants informative for localization and isolation of the BRCAI locus could be obtained only 
from kindreds large enough to confirm the presence of a susceptibility allele. Large sibships are especially important for 
studying breast cancer, since penetrance of the BRCAI susceptibility allele is reduced both by age and sex, making 
informative sibships difficult to find Furthermore large sibships are essential for constructing haplotypes of deceased 
30 individuals by inference from the haplotypes of their close relatives 

While other populations may also provide beneficial information, such studies generally require much greater effort, 
and the families are usually much smaller and thus less informative. Utah's age-adjusted breast cancer incidence is 
20% lower than the average U S. rate. The lower incidence in Utah is probably due largely to an early age at first 
pregnancy, increasing the probability that cases found in Utah kindreds carry a genetic predisposition 
35 

Genetic Mapping 

Given a set of informative families genetic markers are essential for linking a disease to a region of a chromosome 
Such markers include restriction fragment length polymorphisms (RFLPs) (Botstein et al 1 980), markers with a variable 
40 number of tandem repeats (VNTRs) (Jeffreys et al., 1965: Nakamura et at.. 1987), and an abundant class of DNA 
polymorphisms based on shod tandem repeats (STBs), especially repeats of CpA (Weber and May. 1989: Lift et at . 
1989). To generate a genetic map, one selects potential genetic markers and tests them using DNA extracted from 
members of the kindreds being studied 

Genetic markers useful in searching for a genetic locus associated with a disease can be selected on an ad hoc 
45 basis, by densely covering a specific chromosome, or by detailed analysis of a specific region of a chromosome A 
preferred method for selecting genetic markers linked with a disease involves evaluating the degree of informativeness 
of kindreds to determine the ideal distance between genetic markers of a given degree of polymorphism, then selecting 
markers from known genetic maps which are ideally spaced for maximal efficiency Informativeness of kindreds is meas- 
ured by the probability that the markers will be heterozygous in unrelated individuals It is also most efficient to use SIR 
so markers which are detected by amplification of the target nucleic acid sequence using PCR. such markers are highly 
informative easy to assay (Weber and May, 1989), and can be assayed simultaneously using multiplexing strategies 
{Skolmck and Wallace, 1 968), greatly reducing the number of experiments required 

Once linkage has been established one needs to find markers that flank the disease locus, i e one or more markers 
proximal to the disease locus and one or more markers distal to the disease locus Where possible, candidate markers 
55 can be selected from a known genetic map Where none is known, new markers can be identified by the STR technique 
as shown in the Examples 

Genetic mapping is usually an iterative process In the present invention, it began by defining flanking genetic 
markers around the BRCAI locus, then replacing these flanking markers with other markers that were successively 
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, th qprAi ocus As an mitiai step recombination events defined by arge extended Kindreds haloed soo- 
close r to the B distal or P ,oximal to a specific genetic marker 'Goidgar el a! i «4 

cfoally to oca'.ze me BF^CAt teu as _ 0) , h0 prosont , nvGn „on was not well mapped and mere were 

The region surrounding dHua unui j <r n m vats which had been physically 

tew markers Therefore short repetitive sequences on cosmids subcloned Q , mo ^ n . 

mapped were analyzed m order to develop ta|f|ank(nqrnafkerfor ,heBRCAl region Since 42D6 

vention 42D6 was ^'* COVO,eC ; " J^BRCAi regotJwas thus reduced by approximately 1 4 cent, Morgans lEaston 
■s approximately 14 cM from pCMM^6. the BR 9 ^ !|nked d(Slal flanking marker of the 

ef al 1993) The present invention thus began y i 9 marker Midi 5 Therefore BRCAl was shown 

3RCA1 region BPCAi was then discovered to be d,s J ‘° j 42D6 Mgrkef Mtdl91 was subsequently discovered 

to be m a region of 6 to 10 MW1 5 was replaced with Midi 91 as the closest proximal genetic 

to be distal to Mfdl 5 and proximal to BRC , aenetic marker 42D6 narrowing the region 

marker Similarly it was discovered that genetic mar er - marker Mfdl 91 was replaced with tdjl 474 as 

containing the BRCAl as the distal marker, further narrowing the BRCAl region to a 

in the art and described herein, 
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Physical Mapping 

BAC and cosmid clones which cover the region containing the BRCAl locus 
Yeast Artificial Chrom osomes (YACs] 

. th _ orpai Inrus was identified, physical isolation of the DNA in the 
Once a sufficiently small region con a g which covers th9 reglon useful YACs can be isolated from 

region proceeded by identifying a set o ^ove PP 9 |)branes wh|Ch are wldely distrlbu , e d and contain approximately 
known libraries, such as the St Louis and ... accesslb | e hbranes and can be obtained from a number 

Cosmid, Pi and BAC Clones 

In the present invention, it ,s advantageous to proceed by 

The smaller size of these inserts, compared to YAC inserts, ma es area tiy increases the ease with which 

Funh.™... ba» 9 « d«,.d DNA „ bactanal cell.. “» ' "TJ '*2^* F„ c»sm, 0 

me ONA ol intti.it can Co manipulated. and pp.ovss me “ ^hi ol 

subclones o. VAC. « ON. is P»** a» — « 
“ Human C,-t ONA. sa, - «- •*"- *» 

av r,:d:rc«^ 

'z 

define an overlapping contiguous set of clones whic cover g „.,h S0 „,, e «t experiments to identify 

herein as a ■minimum fling path* Such a mm, mum tiling path terms the Pa*,* -u. -*P- 

cDNAs which may originate from the BRCAl locus 

p.nveraqe of the Gap with PI and BAC Clones 

To cover any gaps in the BRCAl contig between the T q Zl\TstS 

vectors which contain inserts of genomic DNA roughly twice^ar ^ igg2) w@r0 used P1 ctones were isolated 

b^Genome pr^ pr3 by us, or screening BACs were provided by hybndization techniques 
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Z^ZZZZZZSZ". Z~ --a c,ofi ° s p»» ~ — - — • 

for candiaate genes as described below 
Gene isolation 

The.. many ,. cheque, I« »« pm* mm » m P~«*° 
me cccmg «* ol a >x.s on. * a.emp.ng .0 aolMe «,ub,ng be, no, hm.eb 

a zoo blots 

b identifying HTF islands 


is c exon trapping 

d hybridizing cDNA to cosmids or YACs 
e screening cDNA libraries 

20 

(a) Zoo blots 

The lechnigue . ,o hybn.z. eoem.be ,o Soe.h.m « * T,* 

« zzt nti“.r »' conla ” 9 DNA 8 

variety of species are commercially available (Clonetech, Cat 7753- ) 

(b) Identifying HTF islands . 

for sites which contain CpG d.mers cut frequently in these regions (Lindsay ) 

(c) Exon trapping 

55 The ,h„b technique . exon t.apptn,. a me, hob tha, ««,.» « 

junctions and therefore are likely to comprise coding sequenc 9 amplification is based on the selection of 

used to select and amplify exons from DNA cl ^ es ® $|(es Th0 produc t s of the exon amplification are used 

RN A sequences which are flanked by functional 5 number of candidate genes for further study. Exon trapping 

n n&rc nr The fourth technique is a modification of the selective 

fdl Hybridizin g cDNA to Cosmids, Pis, BA.C — — d pi BACs or YACs and permits transcribed 

enrichment technique which utilizes hybridization of c DNA (Kandpal et al 1990). The selective enrichment 

- == 

=i " 8 - a 8 '“' e “' “ cDN “ ,n “ ,ea ” ,ep, "'"" d 

the cloned genomic DNA 

so 

(e) Identification of cDNAs 

55 tissue cDNA libraries ovarian cDNA libraries, and ®"^“J?[ A n “^i^ u8ed t0 fjnd candidate genes for BRCA1 (Lovett 
Another variation on the theme of direct selection of cDNA was also us do ^ QNA IS diges ,ed wllh 
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<us binding sues ter primers <n subsequent PCR amphticahon reactions using biotinylated primers Target cDNA s gen- 
erated from rmRNA derived f rom tissue samp-es e g breast tissue by synthesis of eitner 'andom primed or ougotoT ■ 
primed first strand 'oilowea oy second strand synthesis The cDNA ones are rendered b;unt and ligated onto dou- 
ble-stranded adap* . These adapters serve as ampuficat-on sites for PCR The target and probe sequences are de- 
5 natjred and mixed with human C 0 M CNA to block repetit ve sequences Solution hybridization is carried out to high 
C-jt- ■ 2 values to ensure hybridization of rare target cDNA molecules The annealed material is then captured on avidm 
beads, washed at high stringency and the retained cDNAs are eluted and amplified by PCR The selected cDNA is 
subjected to further rounds of enrienment before cloning into a plasmid vector for analysts 

■0 ^estipq the cDNA *or Candidacy 

Proof that the cDNA is the BRCA1 locus is obtained by finding sequences in DNA extracted from affected kindred 
members which create abnormal BRCAi gene products or abnormal levels of BRCA1 gene product Such BRCAt 
susceptibility alleles wiil co-segregate with the disease in large kindreds They will also be present at a much higher 
15 frequency m non-kindred individuals with breast and ovarian cancer then in individuals in the general population Finally 
since tumors often mutate somatically at loci which are in other instances mutated in the germhne we expect to see 
normal germhne BRCAI alleles mutated into sequences which are identical or similar to BRCAI susceptibility alleles in 
DNA extracted from tumor tissue Whether one is comparing BRCAt sequences from tumor tissue to BRCAI alleles 
from the germline of the same individuals or one is comparing germhne BRCAt alleles from cancer cases to those from 
20 unaffected individuals the key is to find mutations which are serious enough to cause obvious disruption to the normal 

function of the gene product These mutations can take a number of forms The most severe forms would be frame shift 

mutations or large deletions which would cause the gene to code for an abnormal protein or one which would significantly 
alter protein expression Less severe disruptive mutations would include small m-frame deletions and nonconservative 
base pair substitutions which would have a significant effect on the protein produced such as changes to or from a 
25 cysteine residue from a basic to an acidic ammo acid or vice versa from a hydrophobic to hydrophilic ammo acid or 

vice versa, or other mutations which would affect secondary tertiary or quaternary protein structure Silent mutations or 
those resulting m conservative ammo acid substitutions would not generally be expected to disrupt protein function 
According to the diagnostic and prognostic method of the present invention alteration of the wild-type BRCAI locus 
is detected In addition, the method can be performed by detecting the wild-type BRCAI locus and confirming the lack 
oo of a predisposition to cancer at the BRCAI locus "Alteration of a wild-type gene" encompasses all forms of mutations 
including deletions, insertions and point mutations in the coding and noncoding regions. Deletions may be of the entire 
gene or of only a portion of the gene Point mutations may result in stop codons frameshift mutations or ammo acid 
substitutions Somatic mutations are those which occur only in certain tissues, e g , in the tumor tissue, and are not 
inherited in the germline Germline mutations can be found in any of a body's tissues and are inherited If only a single 
05 allele is somatically mutated, an early neoplastic state is indicated However, if both alleles are somatically mutated, 
then a late neoplastic state is indicated The finding of BRCAI mutations thus provides both diagnostic and prognostic 
information A BRCAI allele which is not deleted (e g , found on the sister chromosome to a chromosome carrying a 
BRCAI deletion) can be screened for other mutations, such as insertions, small deletions and point mutations It is 
believed that many mutations found in tumor tissues will be those leading to decreased expression of the BRCAI gene 
4 o product However mutations leading to non-functional gene products would also lead to a cancerous state Point mu- 
tational events may occur in regulatory regions such as in the promoter of the gene leading to loss or diminution of 
expression of the mRNA. Point mutations may also abolish proper RNA processing, leading to loss of expression of the 
BRCAI gene product, or to a decrease in mRNA stability or translation efficiency 

Useful diagnostic techniques include, but are not limited to fluorescent in situ hybridization (FISH), direct DNA 
45 sequencing PFGE analysis, Southern blot analysis single stranded conformation analysis (SSCA) RNase protection 
assay, allele-specific oligonucleotide (ASO), dot blot analysis and PCR-SSCP as discussed in detail further below 
Predisposition to cancers, such as breast and ovarian cancer and the other cancers identified herein, can be as- 
certained by testing any tissue of a human for mutations of the BRCAt gene For example, a person who has inherited 
a germline BRCAI mutation would be prone to develop cancers This can be determined by testing DNA from any tissue 
50 of the person's body Most simply, blood can be drawn and DNA extracted from the cells of the blood In addition prenatal 
diagnosis can be accomplished by testing fetal cells placental cells or amniotic cells for mutations of the BRCAI gene 
Alteration of a wild-type BRCAI allele, whether, for example, by point mutation or deletion can be detected by any of 
the means discussed herein 

There are several methods that can be used to detect DNA sequence variation Direct DNA sequencing, either 
55 manual sequencing or automated fluorescent sequencing can detect sequence variation For a gene as large as BRCAI 
manual sequencing is very labor-intensive but under optimal conditions, mutations in the coding sequence of a gene 
are rarely missed Another approach is the single-stranded conformation polymorphism assay (SSCA) (Orita et al 
1989) This method does not detect all sequence changes especially if the DNA fragment size is greater than 200 bp 
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out can be optimized to detect most DMA sequence variation The reduced detection sensitivity :s a disadvantage Out 
me increased throughout possible wth SSCA ma*es it an attractive viabie alternative to direct sequencing for mutation 
detection on a research oasis The fragments which have shifted mobility on SSCA gels are then sequenced to determine 
me exact nature of the DNA sequence variation Other approaches based on the detection of mismatches between the 
5 two complementary DNA strands include clamped denaturing gel electrophoresis (CDGE) (Sheffield ct al 1991 ) het- 
eroduplex analysis (HA) (White et al 1 992} and chemical mismatch cleavage (CMC) (Grompe et al 1 989) None of 
the methods described above will detect large deletions, duplications or insertions nor will they detect a regulatory 
mutation which affects transcription or translation of the protein Other methods which might detect these classes of 
mutations such as a protein truncation assay or the asymmetric assay detect only specific types of mutations and would 
w not detect missense mutations A review of currently available methods of detecting DNA sequence variation can be 
found m a recent review by Grompe (1993) Once a mutation is known an allele specific detection approach such as 
allele specific oligonucleotide (ASO) hybridization can be utilized to rapidly screen large numbers of other samples for 
that same mutation 

In order to detect the alteration of the wild-type BRCAi gene in a tissue it is helpful to isolate the tissue free from 
is surrounding normal tissues Means for enriching tissue preparation for tumor cells are known in the art For example, 
the tissue may be isolated from paraffin or cryostat sections. Cancer cells may also be separated from normal cells by 
flow cytometry These techniques, as well as other techniques for separating tumor cells from normal cells, are well 
known in the art. If the tumor tissue is highly contaminated with normal cells, detection of mutations is more difficult 
A rapid preliminary analysis to detect polymorphisms in DNA sequences can be performed by looking at a series 
20 of Southern blots of DNA cut with one or more restriction enzymes preferably with a large number of restriction enzymes 
Each blot contains a series of normal individuals and a series of cancer cases, tumors, or both Southern blots displaying 
hybridizing fragments (differing in length from control DNA when probed with sequences near or including the BRCAt 
locus) indicate a possible mutation If restriction enzymes which produce very large restriction fragments are used, then 
pulsed field gel electrophoresis (PFGE) is employed 

25 Detection of point mutations may be accomplished by molecular cloning of the BRCAI allele(s) and sequencing the 

allele(s) using techniques well known in the art. Alternatively, the gene sequences can be amplified directly from a 
genomic DNA preparation from the tumor tissue, using known techniques The DNA sequence of the amplified sequenc- 
es can then be determined 

There are six well known methods for a more complete, yet still indirect test for confirming the presence of a sus- 
oo ceptibility allele: 1) single stranded conformation analysis (SSCA) (Orita et al ., 1989): 2) denaturing gradient gel elec- 
trophoresis (DGGE) (Wartell etai. 1990. Sheffield era/., 1989): 3) RNase protection assays (Fmkelstein etai, 1990: 
Kinszler et al. 1991); 4) allele-specific oligonucleotides (ASOs) (Conner et al., 1983), 5) the use of proteins which 
recognize nucleotide mismatches, such as the E. coh mutS protein (Modrich, 1991); and 6) allele-specific PCR (Rano 
& Kidd 1 989) For allele-specific PCR, primers are used which hybridize at their 3' ends to a particular BRCAI mutation 
os if the particular BRCAI mutation is not present, an amplification product is not observed Amplification Refractory Mu- 
tation System (ARMS) can also be used, as disclosed in European Patent Application Publication No 0332435 and in 
Newton et al . 1 989 Insertions and deletions of genes can also be detected by cloning, sequencing and amplification 
In addition, restriction fragment length polymorphism (RFLP) probes for the gene or surrounding marker genes can be 
used to score alteration of an allele or an insertion in a polymorphic fragment Such a method is particularly useful for 
40 screening relatives of an affected individual for the presence of the BRCAI mutation found in that individual. Other 
techniques for detecting insertions and deletions as known in the art can be used 

In the first three methods (SSCA. DGGE and RNase protection assay), a new electrophoretic band appears SSCA 
detects a band which migrates differentially because the sequence change causes a difference in single-strand, intramo- 
lecular base pairing RNase protection involves cleavage of the mutant polynucleotide into two or more smaller frag- 
45 ments DGGE detects differences in migration rates of mutant sequences compared to wild-type sequences, using a 
denaturing gradient gel. In an allele-specific oligonucleotide assay, an oligonucleotide is designed which detects a spe- 
cific sequence, and the assay is performed by detecting the presence or absence of a hybridization signal In the mutS 
assay, the protein binds only to sequences that contain a nucleotide mismatch in a heteroduplex between mutant and 
wild-type sequences 

50 Mismatches, according to the present invention, are hybridized nucleic acid duplexes in which the two strands are 

not 100% complementary Lack of total homology may be due to deletions, insertions inversions or substitutions Mis- 
match detection can be used to detect point mutations in the gene or in its mRNA product While these techniques are 
less sensitive than sequencing, they are simpler to perform on a large number of tumor samples An example of a 
mismatch cleavage technique is the RNase protection method In the practice of the present invention, the method 
55 involves the use of a labeled nboprobe which is complementary to the human wild-type BRCAI gene coding sequence 
The riboprobe and either mRNA or DNA isolated from the tumor tissue are annealed (hybridized) together and subse- 
quently digested with the enzyme RNase A which is able to detect some mismatches in a duplex RNA structure If a 
mismatch is detected by RNase A, it cleaves at the site of the mismatch Thus, when the annealed RNA preparation is 
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sequence of me 3RCA1 open reading frame shown n SEG ID NO 1 design of particular pnme r s is well within the skill 
of the art 

The nucleic acid prooes provided by the present invention are useful for a number of purposes They can be used 
n Southern hybridization to genomic DNA and m the PNase protect.on method for detecting point mutations already 
s d scussed above The probes can do used to detect PCR amplification products They may aiso be used to detect 
mismatches with the 8RCA1 gene or mRNA using other techniques 

It has been discovered that individuals with the wild-type BRCA1 gene do not have cancer which results from the 
BRCA1 allele however, mutations which interfere with the function of the BRCA1 protein are involved m the patnogen- 
esis of cancor Thus Ihe presence of an altered (or a mutant) BRCAi gene which produces a orotem having a loss of 
10 function or altered function, directly correlates to an increased risk of cancer In order to detect a BRCAi gene mutation 
a biological sample is prepared and analyzed for a difference between the sequence ot the BRCAi ancle being analyzed 
and the sequence of the wild-type BRCAI allele Mutant BRCAI alleles can be initially identified by any of the techniques 
described above The mutant alleles are then sequenced to identify the specific mutation of the particular mutant allele 
Alternatively, mutant BRCAI alleles can be initially identified by identifying mutant (altered) BRCAI proteins using con- 
15 ventional techniques The mutant alleles are then sequenced to identify the specific mutation for each allele The mu- 
tations. especially those which lead to an altered function ol the BRCAI protein, are then used for the diagnostic and 
prognostic methods of the present invention 

Definitions 

20 

The present invention employs the following definitions 

"Amplification ot Polynucleotides" utilizes methods such as the polymerase chain reaction (PCR), ligation am- 
plification (or ligase chain reaction. LCR) and amplification methods based on the use of Q-beta replicase These meth- 
ods are well known and widely practiced in the art See, e g U S Patents 4,683.195 and 4,683,202 and inms et at . 
2 s 1 990 (for PCR) and Wu el al . 1 989a (for LCR) Reagents and hardware for conducting PCR are commercially available 
Primers useful to amplify sequences from the BRCAI region are preferably complementary to. and hybridize specifically 
to sequences in the BRCAI region or in regions that flank a target region therein. BRCAI sequences generated by 
amplification may be sequenced directly. Alternatively, but less desirably, the amplified sequence(s) may be cloned prior 
to sequence analysis. A method for the direct cloning and sequence analysis of enzymatically amplified genomic seg- 
30 merits has been described by Scharf 1 986 

"Analyte polynucleotide" and "analyte strand" refer to a single- or double-stranded polynucleotide which is 
suspected of containing a target sequence, and which may be present in a variety of types of samples, including biological 
samples 

« Antibodies." The present invention also provides polyclonal and/or monoclonal antibodies and fragments thereof, 
35 and immunologic binding equivalents thereof, which are capable of specifically binding to the BRCAi polypeptides and 
fragments thereof or to polynucleotide sequences from the BRCAI region particularly from the BRCAI locus or a portion 
thereof. The term H antibody" is used both to refer to a homogeneous molecular entity, or a mixture such as a serum 
product made up of a plurality of different molecular entities Polypeptides may be prepared synthetically in a peptide 
synthesizer and coupied to a carrier molecule (e g., keyhole limpet hemocyamn) and injected over several months into 
40 rabbits. Rabbit sera is tested for immunoreactivity to the BRCAI polypeptide or fragment. Monoclonal antibodies may 
be made by injecting mice with the protein polypeptides, fusion proteins or fragments thereof. Monoclonal antibodies 
will be screened by ELISA and tested for specific immunoreactivity with BRCAI polypeptide or fragments thereof. See, 
Harlow & Lane, 1988. These antibodies will be useful in assays as well as pharmaceuticals 

Once a sufficient quantity of desired polypeptide has been obtained, it may be used for various purposes. A typical 
45 use is the production of antibodies specific for binding These antibodies may be either polyclonal or monoclonal, and 
may be produced by in vitro or in vivo techniques well known in the art For production of polyclonal antibodies, an 
appropriate target immune system typically mouse or rabbit is selected Substantially purified antigen is presented to 
the immune system in a fashion determined by methods appropriate for the animal and by other parameters well known 
to immunologists Typical sites for injection are in footpads intramuscularly, mtraperitoneally, or intradermally Of course, 
so other species may be substituted for mouse or rabbit Polyclonal antibodies are then purified using techniques known 
in the art adjusted for the desired specificity 

An immunological response is usually assayed with an immunoassay Normally, such immunoassays involve some 
purification of a source of antigen, for example, that produced by the same cells and in the same fashion as the antigen 
A variety of immunoassay methods are well known in the art See, e g., Harlow & Lane. 1988, or Godmg, 1986 
55 Monoclonal antibodies with affinities of 10-8 M -i or preferably 10' 9 to lO" 10 M _1 or stronger will typically be made 

by standard procedures as described, e g . in Harlow & Lane, 1988 or Godmg, 1966 Briefly, appropriate animals will 
be selected and the desired immunization protocol followed. After the appropriate period of time, the spleens of such 
animals are excised and individual spleen cells fused typically to immortalized myeloma cells under appropriate selec- 
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vcr corot'ors T hereafter the cells are c i on ally separated ana the supernatants of eacn clone tested for their production 
of an appropriate antibody specific for the desired region of the ant gen 

Ctner suitable techniques involve in vitro exposure of lymphocytes to the antigenic polypeptides or alternatively 
*o selection of libraries of antibodies in phage or similar vectors See Muse el al 1 959 The polypeptides and antibodies 
5 of tne present invention may be used with or without modification Frequently, polypeptides and antibodies will be labeled 
by joning either covalently or non-covalentiy a substance which provides for a detectable signal A wide variety of 
iabels and conjugation tecnmques are Known and are reported extensively in both the scientific and patent literature 
Suitaole labels include radionuclides enzymes substrates cofactors inhibitors, fluorescent agents chemiluminescent 
agents magnetic particles and the like Patents teaching the use of such labels include U S Patents 3 517.837 
10 3 350 752 3. 93Q 350. 3 996 345. 4 277 437 4 275 149 and 4 366 241 Also recombinant immunoglobulins may be 

produced (see U S Patent 4 816 567) 

"Binding partner" refers to a molecule capable of binding a ligand molecule with high specificity as for example 
an antigen and an antigen-specific antibody or an enzyme and its inhibitor In general the specific binding partners must 
bind with sufficient affinity to immobilize the analyte copy/complementary strand duplex (in the case of polynucleotide 
15 hybridization) under the isolation conditions. Specific binding partners are known in the art and include for example, 
biotin and avidin or streptavidm, IgG and protein A, the numerous, known receptor-ligand couples, and complementary 
polynucleotide strands In the case of complementary polynucleotide binding partners the partners are normally at least 
about 1 5 bases in length, and may be at least 40 bases in length. The polynucleotides may be composed of DNA. RNA. 
or synthetic nucleotide analogs 

20 A "biological sample" refers to a sample of tissue or fluid suspected of containing an analyte polynucleotide or 

polypeptide from an individual including, but not limited to. e g plasma, serum spinal fluid, lymph fluid, the external 
sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, blood cells, tumors, organs, tissue and 
samples of in vitro cell culture constituents 

As used herein, the terms "diagnosing" or "prognosing," as used in the context of neoplasia, are used to indicate 
25 i ) the classification of lesions as neoplasia. 2) the determination of the severity of the neoplasia, or 3) the monitoring 
of the disease progression, prior to. during and after treatment 

" Encode - . A polynucleotide is said to 'encode" a polypeptide if. in its native state or when manipulated by methods 
well known to those skilled in the art. it can be transcribed ancVor translated to produce the mRNA for and/or the polypep- 
tide or a fragment thereof The anti-sense strand is the complement of such a nucleic acid, and the encoding sequence 
30 can be deduced therefrom 

"Isolated” or "substantially pure". An "isolated" or "substantially pure’ nucleic acid (e g , an RNA, DNA or a 
mixed polymer) is one which is substantially separated from other cellular components which naturally accompany a 
native human sequence or protein, e g., ribosomes, polymerases, many other human genome sequences and proteins 
The term embraces a nucleic acid sequence or protein which has been removed from its naturally occurring environment, 
35 and includes recombinant or cloned DNA isolates and chemically synthesized analogs or analogs biologically synthe- 
sized by heterologous systems 

"BRCA1 Allele" refers to normal alleles of the BRCA1 locus as well as alleles carrying variations that predispose 
individuals to develop cancer of many sites including, for example, breast ovarian, colorectal and prostate cancer Such 
predisposing alleles are also called "BRCA1 susceptibility alleles". 

40 "BRCA1 Locus," "BRCAt Gene," "BRCA1 Nucleic Acids" or "BRCA1 Polynucleotide" each refer to poly- 

nucleotides, all of which are in the BRCA1 region, that are likely to be expressed in normal tissue, certain alleles of 
which predispose an individual to develop breast, ovarian, colorectal and prostate cancers. Mutations at the BRCA1 
locus may be involved in the initiation and/or progression of other types of tumors The locus is indicated in part by 
mutations that predispose individuals to develop cancer. These mutations fall within the BRCA1 region described infra. 
45 The BRCA1 locus is intended to include coding sequences, intervening sequences and regulatory elements controlling 
transcription and/or translation The BRCA1 locus is intended to include all allelic variations of the DNA sequence. 

These terms, when applied to a nucleic acid refer to a nucleic acid which encodes a BRCA1 polypeptide, fragment, 
homolog or variant, including, e g protein fusions or deletions The nucleic acids of the present invention will possess 
a sequence which is either derived from, or substantially similar to a natural BRCA1 -encoding gene or one having 
50 substantial homoiogy with a natural 3RCA1 -encoding gene or a portion thereof The coding sequence for a BRCA1 
polypeptide is shown in SEQ ID NO 1 . with the ammo acid sequence shown in SEQ ID NO 2 

The polynucleotide compositions of this invention include RNA. cDNA, genomic DNA synthetic forms, and mixed 
polymers both sense and antisense strands, and may be chemically or biochemically modified or may contain non-nat- 
ural or derivatized nucleotide bases, as will be readily appreciated by those skilled in the art Such modifications include 
55 for example labels methylation. substitution of one or more of the naturally occurring nucleotides with an analog m- 
ternucleotide modifications such as uncharged linkages (e g , methyl phosphonates, phosphotriesters phosphoami- 
dates. carbamates, etc.), charged linkages (e g phosphorothioates, phosphorodithioates etc.), pendent moieties (e g , 
polypeptides) mtercalators (e g , acridine psoralen, etc ), chelators, alkylators. and modified linkages (e g , alpha ano- 
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"lone nuclei ac.as etc ; Also included are synthetic moiecuies that mimic polynucleotides in tnoir ability to omd to a 
designated sequence via hydrogen bonding and other chemical interactions Such molecules are Known in tne ad and 
me! jde for example, those in which peptide linkages substitute for phosphate linkages m the backbone of the molecu.e 

The present invention provides recombinant nucleic acids comprising all or part of the BRCA1 region The recom- 
5, pant construct may be capable of replicating autonomously m a host ceil Alternatively the recombinant construct may 
become integrated into the chromosomal DNA of the host cell Such a recombinant polynucleotide comprises a poly- 
nucleotide of genomic cDNA semi-synthetic, or synthetic origin which by virtue of its origin or manipulation 1 ) is not 
associated witn all or a portion of a polynucleotide with which it is associated in nature. 2) is linked to a polynucleotide 
other than that to which it is linked in nature or 3) does not occur in nature 

Therefore recombinant nucleic acids comprising sequences otherwise not naturally occurring are provided by this 
invention Although the wild-type sequence may be employed, it will often be altered, e g by deletion, substitution or 
insertion 

cDNA or genomic libraries of various types may be screened as natural sources of the nucleic acids of the present 
invention, or such nucleic acids may be provided by amplification of sequences resident in genomic DNA or other natural 
sources eg by PCR The choice of cDNA libraries normally corresponds to a tissue source which is abundant in mRNA 
for the desired proteins. Phage libraries are normally preferred, but other types of libraries may be used Clones of a 
library are spread onto plates ; transferred to a substrate for screening, denatured and probed for the presence of desired 

sequences 

The DNA sequences used in this invention will usually comprise at least about five codons (15 nucleotides) more 
usually at least about 7-1 5 codons, and most preferably, at least about 35 codons 

One or more mtrons may also be present This number of nucleotides is usually about the minimal length required 
for a successful probe that would hybridize specifically with a BRCA1 -encoding sequence. 

Techniques for nucleic acid manipulation are described generally for example in Sambrook et al 1 969 or Ausubel 
et al., 1992 Reagents useful in applying such techniques, such as restriction enzymes and the like, are widely known 
in the art and commercially available from such vendors as New England BioLabs. Boehringer Mannheim. Amersham. 
Promega Biotec. U S. Biochemicals. New England Nuclear, and a number of other sources The recombinant nucleic 
acid sequences used to produce fusion proteins of the present invention may be derived from natural or synthetic se- 
quences Many natural gene sequences are obtainable from various cDNA or from genomic libraries using appropriate 
probes See. GenBank, National Institutes of Health. 

"BRCA1 Region - refers to a portion of human chromosome 17q21 bounded by the markers tdj 1474 and U5R 

This region contains the BRCA1 locus, including the BRCA1 gene 

As used herein, the terms "BRCA1 locus," "BRCA1 allele" and"BRCA1 region" all refer to the double-stranded 
DNA comprising the locus, allele, or region, as well as either of the single-stranded DNAs comprising the locus, allele 
or region 

As used herein, a "portion" of the BRCA1 locus or region or allele is defined as having a minimal size of at least 
about eight nucleotides or preferably about 15 nucleotides, or more preferably at least about 25 nucleotides and may 
have a minimal size of at least about 40 nucleotides 

"BRCA1 protein" or "BRCA1 polypeptide" refer to a protein or polypeptide encoded by the BRCA1 locus var- 
iants or fragments thereof The term "polypeptide" refers to a polymer of ammo acids and its equivalent and does not 
refer to a specific length of the product; thus, peptides, oligopeptides and proteins are included within the definition of 
a polypeptide. This term also does not refer to, or exclude modifications of the polypeptide, for example, glycosylations, 
acetylations, phosphorylations, and the like Included within the definition are, for example polypeptides containing one 
or more analogs of an amino acid (including, for example unnatural ammo acids, etc ), polypeptides with substituted 
linkages as well as other modifications known in the art, both naturally and non-naturaily occurring Ordinarily, such 
polypeptides will be at least about 50% homologous to the native BRCA1 sequence preferably in excess of about 90% 
and more preferably at least about 95% homologous Also included are proteins encoded by DNA which hybridize under 
high or low stringency conditions to BRCA1 -encoding nucleic acids and closely related polypeptides or proteins retrieved 
by antisera to the BRCA1 protein(s). 

The length of polypeptide sequences compared for homology will generally be at least about 1 6 ammo acids, usually 
at least about 20 residues, more usually at least about 24 residues typically at least about 28 residues, and preferably 
more than about 35 residues 

"Operably linked" refers to a juxtaposition wherein the components so described are in a relationship permitting 
them to function in their intended manner For instance a promoter is operably linked to a coding sequence if the promoter 
affects its transcription or expression 

"Probes". Polynucleotide polymorphisms associated with BRCA1 alleles which predispose to certain cancers or 
are associated with most cancers are detected by hybridization with a polynucleotide probe which forms a stable hybrid 
with that of the target sequence, under stringent to moderately stringent hybridization and wash conditions If it is ex- 
pected that the probes will be perfectly complementary to the target sequence, stringent conditions will be used Hy- 
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O'ldizaffon stringency may be lessened .f some lleout^onspeci^c/adventit.ous 
'sselt that the D-obe w,ll no, be comp'etely comp^erenm y^^ neJUa| DNA po , ynPO rphisns as well as mu, at, or, s 

omaings that is. whicn minimize noise Si detection of a 3RCA1 susceptibility allele 

these .no. cations need further analys.s to cfemorstrate , he BBC A1 region or its cDNAs The probes may 

Probes for BRCA1 alleles may be derived from the f and wb,ch allow specific hybridization to the 

oe 0 < any suitable length which span all or a port.o o . . '9 ^ q( ^ prQD0 , he probcs m ay be snort e g 

BRCA1 region If the target sequence contains J ^sequ ^ ^ rela , |Ve|y slaP , e under even stringent conditions If some 
m the range of about 8-30 base pairs since t y ccted that the probe will hybridize to a variant region a 

degree of mismatch ,s expected w„h the probe . J wlth , he feq u,s„e specificity 

longer prooe may be employed which hybridizes ' ‘ ® hed to a label or reporter molecule and may be used to 

The probes will include an isolated poynu , arjtv by standard methods For techniques tor preparing 

isolate other polynucleotide sequences having seq Ausube | et a/ ^ 92 . Other similar polynucleotides may be 

and labeling probes see eg. Sambrook ® " A | te matively polynucleotides encoding these or similar polypeptides 

ssrs^emes - *«* ^- o,ndin9 

affinities, interchain affinities or the polypeptide lhe present invention may be derived from 

Probes comprising or be chemically synfhesized Probes 

naturally occurring or recombinant single reaction or other methods known in the art 

may also be labeled by nick translation, Kfenow r Wh about fl nude0 , lde s. usually at least about 1 5 nude- 

Portions of the polynucleotide sequence 9 polynucleotide sequence encoding BRCA1 

:zr,r:z r tr % r- — - -j: . ■**- • — - * 

CS " ""Protein modifications or fragments" are included g„ m vivo or in 

ments thereof which are substantially bc*no\oqous P ^ * g| aminoaclds Such modifications include, tor 

vitro chemical and biochem.cal modrticati °"® “ ubl qu,t, nation, labeling, e.g. with radionuclide^ and 
example acetylation, carboxylation P h ° sphb ^ apprec ,ated by those well skilled in the art A variety of methods for 
various enzymaffc modifications as will be reaMy PP ^ purposes are well known in the art, and include 

labeling polypeptides and ot substituents or labeled antiligands (e g , antibodies), fluorophores, chemilumi- 

s 

or Ausubel et a/ . 1992 nresent invention provides for biologically active fragments ot the 

Besides substantially full-length polypepti . P imrnuno iogical activity and other biological activities 

polypeptides. Significant biological activities include ^ » po , h immunog en,c function in a target immune 

characteristic ot BRCA1 polypeptides I mmunotog, cal act ^ ag e|(her a competlt0 r or substitute antigen 

system as well as sharing of immunological ^eptope d g, ^ anligenic de term,nant of a polypeptide An 

- - — *■ — * -r 

Production of antibodies specific for BRC p yp P ^ comonsinq BRCA1 polypeptides and fragments. Ho- 

The present invention also provides for usion po Y- _ BRCA1 p0 | ypep tide sequences or between the sequences 
rnoiogous polypeptides may be fusions betweer 1*0 )(js|ons mgy be construC , e d which would exhibit, a combination 

of BRCA1 and a related protein Likewise, 9 Ugand-bmding or other domains may De swapped 

of properties or activit.es of the denvative profeins ^or examp ^ or he1erologous fus , on polypeptides may 

between different new fusion polypeptides or 9 partners include immunoglobulins, bacterial |5-ga- 
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, e rh a<t , r „„ cells transformed with recomb, nan! nucleic acids enccwmg BRCAi and are well known in me 
example such polypeptides may be purified by immuno-affimty chromatography employing e g the 

de,C ; ,Md »“ ■•ub.Mntl.il> arc asad » 

dascnDe a -.oiein a poiypept.de whch ms Peen sepatated Item components witft accompany « m ns 

seqceoce A ,yO.,an„.ll, P®. = ^ A » «d P, a nomoe, 

or ot r;R2:r P :otr s T a r c^^,; s y 

conservative ammo acid while typically introducing or removing a sequence recognition site Alternately, it ,s performed 

■nr.:™ 

may ma, also « mote distant Mm ma coding tegton, *dt attest the e«pt.ss,on « "» 9 e "> '" cl "" 3 

„t me gan. and ■»«. #» . Jrt is •sab.ant.lt, nomotogou.' «. 

Sbb^S^‘^*»' « «*’ »'™* me 

sssr 

»*»■ 

Sr=S-=EEru=r^=rr=.^-ssr 

3SHSs=ri^."j==ir=-.r=: 

,h9 The terms ‘substantial homology- or ‘substantial Identity, when referring to polypeptides, indicate that the 
polypept.de or protein ,n quest, on exhibits a. leas, about 30% identity with *" ^ 

port, on thereof, usually a, least about 70% identity and preferably attest protei n. w„h ref- 

acX W™A°i Jipept.de. The modified polypeptide will be subs, an- 
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t,ai!y homologous to tr.o wild-type BBCA1 polypeptide and will nave substantially the same function ' ne mod' iec 
poiypeotioe may have an altered ammo acid sequence and'or may contain modified ammo acids In addition to the 
similarity of ‘unction, the modif'od polypeptide may nave other useful properties such as a .onger nalt-lite The scanty 
of function .activity) of the modified polypeptide may be suostantially the same as the activity of the w.ld-type BRCA1 
poiypop'ioe Alternatively the similarity of lunction (activity) ot the modified polypeptide may be higher than me actvty 
of the wiid-type BPCAf polypeptide The modified polypeptide is synthesized using conventional techniques or is en- 
coded by a mooitiCd nucleic acid and produced using conventional techniques The modified nucleic acid is prepared 
by conventional techniques A nucleic acid with a function substantially similar to the wild-type BRCA1 gene function 
produces the moaned o r otein described above 

Homoloqy tor polypeptides is typically measured using sequence analysis software See e g the Sequence Ana - 
ys,s Software Package of the Genetics Computer Group. University ot Wisconsin Biotechnology Center 910 University 
Avenue Madison Wisconsin 53705 Protein analysis software matches similar sequences using measure of homology 
assiqned to various substitutions, deletions and other modifications Conservative substitutions typically include substi- 
tutions within the following groups glycine, alanine valine isoleucine, leucine aspartic acid, glutamic acid asparagine 
qlutamme serine threonine lysine, arginine and phenylalanine, tyrosine 

A polypeptide "fragment," “portion" or “segment” is a stretch of ammo acid residues of at least about five to 
seven contiguous amino acids, often at least about seven to nine contiguous amino acids typically at least about nine 
to 1 3 contiguous ammo acids and. most preferably, at least about 20 to 30 or more contiguous ammo acids 

The polypeptides of the present invention. ,f soluble may be coupled to a solid-phase support, e g nitrocellulose 
nylon column packing materials (e g . Sepharose beads), magnetic beads, glass wool, plastic, metal, polymer gels, 
cells or other substrates Such supports may take the form for example of beads, wells, dipsticks or membranes 

"Target region" refers to a region of the nucleic acid which is amplified and/or detected The term target se- 
quence” refers to a sequence with which a probe or primer will form a stable hybrid under desired conditions 

The practice of the present invention employs, unless otherwise indicated, conventional techniques of chemistry, 
molecular biology microbiology recombinant DNA, genetics, and immunology See. eg , Mamatis et al 19B2: Sam- 
brook et al 1989 Ausubel eta/., 1992; Glover 1985; Anand, 1992: Guthrie 8 Fink. 1991 A general discussion of 
techniques and materials for human gene mapping, including mapping of human chromosome 17q. is proved e g., 
in White and Lalouel, 1 988 

Preparation of recombinant or chemically synthesized nucleic acids vector s, transformation , host c alls 

Larqe amounts of the polynucleotides of the present invention may be produced by replication in a suitable host 
cell Natural or synthetic polynucleotide fragments coding for a desired fragment will be incorporated into recombinant 
polynucleotide constructs, usually DNA constructs, capable of introduction into and replication in a prokaryotic or eu- 
karyotic cell Usually the polynucleotide constructs will be suitable for replication in a unicellular host, such as yeast or 
bacteria but may also be intended for introduction to (with and without integration within the genome) cultured mam- 
malian or plant or other eukaryotic cell lines The purification of nucleic acids produced by the methods of the present 
invention is described, e.g., in Sambrook eta/.. 1989 or Ausubel etal, 1992. 

The polynucleotides of the present invention may also be produced by chemical synthesis, e g by the phosphor- 
amidite method described by Beaucage 8 Carruthers, 1 981 br the triester method according to Matteucci and Caruthers, 

1 981 and may be performed on commercial, automated oligonucleotide synthesizers A double-stranded fragment may 
be obtained from the single-stranded product of chemical synthesis either by synthesizing the complementary strand 
and annealing the strands together under appropriate conditions or by adding the complementary strand using DNA 
polymerase with an appropriate primer sequence 

Polynucleotide constructs prepared for introduction into a prokaryotic or eukaryotic host may comprise a replication 
system recognized by the host, including the intended polynucleotide fragment encoding the desired polypeptide, and 
will preferably also include transcription and translational initiation regulatory sequences operably linked to the polypep- 
tide encoding segment. Expression vectors may include, for example, an origin of replication or autonomously replica ing 
sequence (ARS) and expression control sequences, a promoter an enhancer and necessary processing informa ion 
, j pm a rolvadenv^ion sites tfcinscnptionsl terrninstor sequences. sncJ 

sites such as ribosome-Dinoing sues, HiNAapiice saes, poiydueny.a.n l - 3 - a , 

mRNA stabilizing sequences Secretion signals may also be included where appropriate whether from a native BRCA1 
protein or from other receptors or from secreted polypeptides of the same or related species, which allow the protein to 
cross and/or lodge in cell membranes, and thus attain its functional topology, or be secreted from the cell. Such vectors 
may be prepared by means of standard recombinant techniques well known in the art and discussed, tor example, in 
Sambrook eta/. 1989 or Ausubel et a/. 1992 

An appropriate promoter and other necessary vector sequences will be selected so as to be functional in the host, 
and may include when appropriate those naturally associated with BRCA1 genes. Examples of workable combinations 
of cell lines and expression vectors are described in Sambrook et al 1989 or Ausubel et al 1992. see also, e g . 
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Mc:zaer et ai 1966 Many useful vectors are Known in tne art and may oe ootamod from seen vendors as stratagene 
New England Biolabs °;cmega Biotecn and others Promoters such as the trp lac and phage promoters tRNA pro- 
moters and glycolytic enzyme promoters may be used in prokaryotic nosts Useful yeast promoters include promoter 
■egions for metallothionem 3-phosphogiycerate Kinase or other glycoiytic enzymes such as enoiase or giyceralde- 
nyae-3-Dhosphate dehydrogenase, enzymes responsible for maltose and galactose utilization, an ot ers ec ors an 
P'omoters suitable for use in yeast expression are further described in h.tzeman eta! EP 73 675A Appropriate non-na- 
hve mammalian r -oters might include the early and late promoters from SV40 'Piers of a/ 1 975) or promoters derived 
from murine Moo / leukemia virus, mouse tumor virus, avian sarcoma viruses adenovirus II bovine papilloma virus 
or polyoma In add, non the mstruct may be pined to an amplitiable gene (e g DHFB) so that multiple copies of the 
gene may be made For appropriate enhancer and other expression control sequences see also Enhancers and Eu- 
karyotic Gene Expression Cold Spring Harbor Press, Cold Spring Harbor New York ( 1 983) 

While such expression vectors may replicate autonomously they may also replicate by being inserted into e ge- 
nome of the host cell by methods well known in the art 

Expression and cloning vectors will likely contain a selectable marker, a gene encoding a protein necessary for 
survival or growth of a host cell transformed with the vector The presence of this gene ensures growth of only those 
host cells which express the inserts Typical selection genes encode proteins that a) confer resistance to antibiotics or 
other toxic substances e g ampicillin, neomycin methotrexate, etc b) complement auxotrophic deficiencies, or c) 
supply critical nutrients not available from complex media, e g . the gene encoding D-alamne racemase for Bacilli The 
choice of the proper selectable marker will depend on the host cell, and appropriate markers for different hosts are well 

Kn ° The^ectors containing the nucleic acids of interest can be transcribed in vitro and the resulting RNA introduced 
into the host cell by well-known methods, e g , by m,ection (see, Kubo et at, 1968). or the vectors can be 
directly into host cells by methods well known ,n the art, which vary depending on the type of cellular host, me uding 
electroporation transfection employing calcium chloride, rubidium chloride, calcium phosphate DE AE-dextran, or other 
substances; m,cropro)ectile bombardment, lipolection, infection (where the vector is an infectious agent such as a ret 
roviral genome) and other methods See generally. Sambrook et at 1989 and Ausubel etal 1992 The 'htr^uction 
of the polynucleotides into the host cell by any method known in the art. including, inter alia, those described abov^ 
will be referred to herein as “transformation." The cells into which have been introduced nucleic acids described above 

are meant to also include the progeny of such cells th 

Larqe quantities of the nucleic acids and polypeptides of the present invention may be prepared by expressing the 
BRCA1 nucleic acids or portions thereof in vectors or other expression vehicles in compatible prokaryotic or eukaryotic 
host cells The most commonly used prokaryotic hosts are strains of Escherichia coir although other prokaryotes, such 

as Bacillus subtilis or Pseudomonas may also be used. . omrihlhian nr 

Mammalian or other eukaryotic host cells, such as those of yeast, filamentous fungi, plant, insect, or amphibian or 
avian species may also be useful for production of the proteins of the present invention Propagation of mammalian 
cells in culture ,s per se well known See, Jakoby and Pastan. 1979 Examples of commonly used mammalian host ce 
lines are VERO and HeLa cells, Chinese hamster ovary (CHO) cells, and WI38, BHK. and COS cell lines, although 
will be appreciated by the skilled practitioner that other cell lines may be appropriate, e g to provide higher expression, 
desirable glycosylation patterns, or other features 

Clones are selected by using markers depending on the mode of the vector construction. The marker may be on 
the same or a different DNA molecule, preferably the same DNA molecule In prokaryotic hosts, the transformant may 
be selected eg by resistance to ampicillin. tetracycline or other antibiotics. Production of a particular product based 
on temperature sensitivity may also serve as an appropriate marker 

Prokaryotic or eukaryotic cells transformed with the polynucleotides of the present invention will be useful not only 
for the production of the nucleic acids and polypeptides of the present invention, but also, for example, in studying the 

characteristics of BRCA1 polypeptides. onrai irv-ne a« 

Antisense polynucleotide sequences are useful in preventing or diminishing the expression of the BRCA1 locus, as 
will be appreciated by those skilled in the art For example, polynucleotide vectors containing all or a portion of the 
BRCA1 locus or other sequences from the BRCA1 region (particularly those flanking the BRCA1 locus) may be placed 
under the control of a promoter in an aniisense orientation and introduced into a cel! Expression of such an an isense 
construct within a cell will interfere with BRCA1 transcription and/or translation and/or replication 

The probes and primers based on the BRCA1 gene sequences disclosed herein are used to identify homologous 
BRCA1 gene sequences and proteins in other species These BRCA1 gene sequences and proteins are used in the 
diagnostic/prognostic therapeutic and drug screening methods described herein for the species from which they have 

been isolated 
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Metroas of use Nuclei Ac.d Diagnosis ana D iagnostic Kits 

in crccr ’o detect the presence of a SRCAt alioie predisposing an individual to cancer a biological sample such as 
orood s orepared and analyzed for the presence or absence of susceptibility alleles ol BRCA1 in order to detect the 
oresence of neoplasia the progression toward mal.gnancy of a precursor lesion or as * 

sample of the lesion is orepared and analyzed for the presence or absence of mutant alleles of BRCAi Results o these 
tests and interpretive information are returned to the health care provider for communication to the tested mdiwdual 
Such diagnoses may be performed by diagnostic laboratories or alternatively diagnostic mis are manu actur an 

sold to health care providers or to private individuals for self-diagnosis 

initially the screening method involves amplification of the relevant BRCAI sequences In another prefe ed e - 

bodiment of the invention the screening method involves a non-PCR based strategy Such screening me os 

two-step label amplification methodologies that are well Known in the art. Both PCR and non-PCR based screen, ng 

strateaies can detect tarqet sequences with a high level of sensitivity 

The most popular method used today is target amplification Here the target nucleic acid sequence >s amplified 
with polymerases One particularly preferred method using polymerase-driven amplification ,s the polymerase chain 
reacton (PCR) The polymerase chain reaction and other polymerase-driven amplification assays can achieve over a 
million-fold increase in copy number through the use of polymerase-dr, ven amplification cycles Once amplified, 
resultino nucleic acid can be sequenced or used as a substrate for DNA probes 

When the probes are used to detect the presence of the target sequences (for example in .screening for cancer 
susceptibility) the biological sample to be analyzed, such as blood or serum may be treated, if desired, to extract t e 
nucleic acids The sample nucleic acd may be prepared in various ways to facilitate detection ol the target sequence, 
e g denaturation, restriction digestion, electrophoresis or dot blotting The targeted regtor '« . ,h ° ^eT.the^quen e 
usually must be at least partially single-stranded to form hybrids with the targeting sequence of the probe If the sequence 
i^nattJrai^y single-stranded, denaturation will not be required However, „ the sequence is double-stranded, he sequence 
will Drobablv need to be denatured Denaturation can be carried out by various techniques known in the art. 

Analyte nucleic acd and probe are incubated under conditions which promote stable hybrid formation of the target 
sequence In the probe with the putative targeted sequence in the analyte. The region of the probes which is used to 
b,n Q d tothe analytecan be made completely complementary to the targeted region of human chromosome 1 7q Therefore. 
^ “d"L £ oW.r 10 0,..en, ,„S. **.»« «« 

used only ,f the probes are complementary to regions of the chromosome which are unique ,n the genome The stringency 
Th2 a t ^s determined by a number of factors during hybridization and during the washing procedure, including 
tem^ratu^ ^n^ strength base compos, tion. probe length, and concentration of formamide These factors are outlined 
in for example Mamatis et a/ , 1982 and Sambrook ef a/., 1989. Under certain circumstances, the formation of higher 
order hybrids, such as triplexes, quadraplexes. etc , may be desired to provide the means detecting 

Detection if any of the resulting hybrid is usually accomplished by the use of labeled probes. Alternat V- 
probe mTbe un^efed bu, may be detectable by specific binding with a ligand which is labeled either c^recby or 
Indirectly Suitable labels, and methods for labeling probes and ligands are known in the art and include for ^ample 
radioactive labels which may be incorporated by known methods (e g . nick translation random priming or kinasing), 
biotin fluorescent groups, chemiluminescent groups (e g , dioxetanes particularly triggered dioxetanes), ^zyrnes, am 
Sdies aid the Z Nations of this basic scheme are known in the art, and include those variations that facilitate 
separation of the hybrids to be detected from extraneous materials and/or that amplify the signal from th^e labeled^ y 
A number of these variations are reviewed in, e g , Matthews & Kricka, 1968 Lan egren 
Patent 4 668 105 and in EPO Publication No 225 807 

As noted above non-PCR based screening assays are also contemplated in this invention An exemp ary non-PCR 
based procedure is provided in Example 11 This procedure hybridizes a nucleic acid probe (or an analog such as a 
methyl phosphonate backbone replacing the normal phosphod, ester), to the low level 

an enzvme covalently linked to the probe, such that the covalent linkage does not interfere with the specif city o 

«»>”«< *7 »” p « 

enzyme con|ugate and a substrate is added for enzyme detection Enzymatic activity ,s observed as a change .in colo 
development or luminescent output resulting in a 10*-10« increase in sensitivity For a " .***%* "“'"J e jabtonskfef 
ration of oligodeoxynucleotide-alkalme phosphatase con,ugates and their use as hybridization probes see 

a ' Two-step label amplification methodologies are known ,n the art These assays work on the principle .that a t*nall 
l.qand (such as digoxigemn, biotin, or the like) is attached to a nucleic acid probe capable of specifically binding BRC 
Exemplary probes are provided in Table 9 of this patent application and additionally include the nucleic aci p 
corresponding to nucleotide positions 3631 to 3930 of SEQ ID NO 1 Allele specific probes are also contemplated w,t 
the scope of this example and exemplary allele specific probes include probes encompassing the predisposing mutations 
summarized in Tables 1 1 and 1 2 of this patent application 
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ir one cxanoic tnc small ligand attachedtc the nucleic acid prooo is specifically recognized oy an antiDody enzyme 
conjugate In one empodiment of this example digoxigemn is attached to tne nucleic acid prcDe HyDndization is detected 
oy an antibody-alkaline phosphatase corrugate which turns over a chemiluminescent substrate For methods for labeling 
-ucleic acid probes according to this embodiment see Martin et al 1950 In a second example the small ugand is 
5 'ocogn zed by a second ligand-enzyme conjugate that is capaoie of specifically ccmpiexmg to the first ugand A wen 
■mown emcodiment of this example is the biotm-avidm type of interactions For methods for labeling nucleic acid probes 
and their use in piotin-avidm based assays see Rigby ef al 1 977 and Nguyen of al 1 992 

It is also contemplated within the scope of this invention that the nucleic acid probe assays of this invention will 
employ a cocktail of nucleic acid probes capable of detecting BRCA1 Thus in one example to detect the presence of 
•o BRCA1 in a cell sample more than one probe complementary to BRCA1 is employed and in particular the number of 
different probes is alternatively 2. 3 or 5 different nucleic acid probe sequences In another example to detect the 
presence of mutations in the BRCA1 gene sequence in a patient, more than one probe complementary to BRCA1 is 
employed where the cocktail includes probes capable of binding to the allele-specific mutations identified in populations 
of patients with alterations in BRC Al . In this embodiment any number of probes can be used, and will preferably include 
is probes corresponding to the major gene mutations identified as predisposing an individual to breast cancer Some 
candidate probes contemplated within the scope of the invention include probes that include the allele-specific mutations 
identified in Tables 1 1 and 1 2 and those that have the BRCA1 regions corresponding to SEQ I D NO 1 both 5' and 3' to 
the mutation site. 

20 Methods of Use: Peptide Diagnosis and Diagnostic Kits 

The neoplastic condition of lesions can also be detected on the basis of the alteration of wild-type BRCA1 polypep- 
tide Such alterations can be determined by sequence analysis in accordance with conventional techniques More pref- 
erably antibodies (polyclonal or monoclonal) are used to detect differences in. or the absence of BRCA1 peptides The 
2s antibodies may be prepared as discussed above under the heading 'Antibodies' and as further shown in Examples 12 
and 13 Other techniques for raising and purifying antibodies are well known in the art and any such techniques may 
be chosen to achieve the preparations claimed in this invention In a preferred embodiment of the invention, antibodies 
will immunoprecipitate BRCA1 proteins from solution as well as react with BRCA1 protein on Western or immunoblots 
of polyacrylamide gels In another preferred embodiment, antibodies will detect BRCA1 proteins in paraffin or frozen 
30 tissue sections, using immunocytochemica! techniques 

Preferred embodiments relating to methods for detecting BRCA1 or its mutations include enzyme linked immuno- 
sorbent assays (ELISA), radioimmunoassays (Rl A), immunoradiometric assays (IRMA) and immunoenzymatic assays 
(IEMA), including sandwich assays using monoclonal and/or polyclonal antibodies Exemplary sandwich assays are 
described by David et al. in U S Patent Nos 4 376,110 and 4,486,530, hereby incorporated by reference, and exem- 
35 plified in Example 14 

Methods of Use' Drug Screening 

This invention is particularly useful for screening compounds by using the BRCA1 polypeptide or binding fragment 

40 thereof in any of a variety of drug screening techniques. 

The BRCA1 polypeptide or fragment employed in such a test may either be free in solution, affixed to a solid support, 
or borne on a cell surface One method of drug screening utilizes eucaryotic or procaryotic host cells which are stably 
transformed with recombinant polynucleotides expressing the polypeptide or fragment, preferably in competitive binding 
assays. Such cells, either in viable or fixed form, can be used for standard binding assays One may measure, for 
45 example, for the formation of complexes between a BRCA1 polypeptide or fragment and the agent being tested, or 
examine the degree to which the formation of a complex between a BRC Al polypeptide or fragment and a known ligand 
is interfered with by the agent being tested 

Thus, the present invention provides methods of screening for drugs comprising contacting such an agent with a 
BRCA1 polypeptide or fragment thereof and assaying (i) for the presence of a complex between the agent and the 
50 BRCA1 polypeptide or fragment, or (ii) for the presence of a complex between the BRCA1 polypeptide or fragment and 
a ligand by methods well known in the art In such competitive binding assays the BRCA1 polypeptide or fragment is 
typically labeled Free BRCA1 polypeptide or fragment is separated from that present in a protein protein complex, and 
the amount of free (i e , uncomplexed) label is a measure of the binding of the agent being tested to BRCA1 or its 
interference with BRC Al ligand binding, respectively 

55 Another technique for drug screening provides high throughput screening for compounds having suitable binding 

affinity to the BRCA1 polypeptides and is described in detail in Geysen PCT published application WO 84/03564 pub- 
lished on September 1 3, 1 984 Briefly stated, large numbers of different small peptide test compounds are synthesized 
on a solid substrate, such as plastic pins or some other surface The peptide test compounds are reacted with BRCA1 
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poiyDeptide and washed Bound BRCAl polypeptide is then detected by methods well known in the ad 

Purified BRCAl can be coated directly onto plates for use ,n the aforementioned drug screening techniques How- 
ever. non-neutralizing antibodies to the polypeptide can be used to capture antibodies to immobilize the BRCAl polypep- 
tide on the solid phase 

5 This invention aiso contemplates the use of competitive drug screening assays in which neutralizing antibodies 

capable of specifically binding the BRCAl polypeptide compete with a test compound for binding to the 3RCA1 polypep- 
tide or fragments thereof In this manner the antibodies can be used to detect the presence of any peptide wo cn shares 
one or more antigenic determinants of the BRCAl polypeptide 

A further technique for drug screening involves the use of host eukaryotic cell lines or cells (such as described 
io above) whicn have a nonfunctional BRCAl gene These host cell lines or cells are defective at the BRCAl polypeptide 
level The host cell lines or cells are grown in the presence of drug compound The rate of growth of the host cells s 
measured to determine if the compound is capable of regulating the growth of BRCAl defective cells 

Methods of Use Rational Drug Design 
15 

The goal of rational drug design is to produce structural analogs of biologically active polypeptides of interest or of 
small molecules with which they interact (e g., agonists, antagonists inhibitors) in order to fashion drugs which are for 
example, more active or stable forms of the polypeptide or which, e g enhance or interfere with the function of a 
polypeptide in vivo See e g , Hodgson. 1991 In one approach one first determines the three-dimensional structure of 
20 a protein of interest (e g , BRCAl polypeptide) or, for example, of the BRCAl -receptor or ligand complex, by x-ray 

crystallography, by computer modeling or most typically, by a combination of approaches Less often, useful information 
regarding the structure of a polypeptide may be gained by modeling based on the structure of homologous proteins An 
example of rational drug design is the development of HIV protease inhibitors (Erickson et af 1 990) In addition peptides 
(e g , BRCAl polypeptide) are analyzed by an alanine scan (Wells, 1991) In this technique, an ammo acid residue is 
25 replaced by Ala, and its effect on the peptide's activity is determined. Each of the amino acid residues of the peptide is 

analyzed in this manner to determine the important regions of the peptide 

It is also possible to isolate a target-specific antibody selected by a functional assay, and then to solve its crystal 
structure I n principle, this approach yields a pharmacore upon which subsequent drug design can be based It is possible 
to bypass protein crystallography altogether by generating anti-idio-typic antibodies (anti-ids) to a functional, pharma- 
30 cologically active antibody. As a mirror image of a mirror image, the binding site of the anti-ids would be expected to be 
an analog of the original receptor. The anti - id could then be used to identify and isolate peptides from banks of chemically 
or biologically produced banks of peptides Selected peptides would then act as the pharmacore 

Thus, one may design drugs which have, eg, improved BRCAl polypeptide activity or stability or which act as 
inhibitors agonists, antagonists, etc of BRCAl polypeptide activity By virtue of the availability of cloned BRCAl se- 
35 quences, sufficient amounts of the BRCAl polypeptide may be made available to perform such analytical studies as 
x-ray crystallography In addition, the knowledge of the BRCAl protein sequence provided herein will guide those em- 
ploying computer modeling techniques in place of, or in addition to x-ray crystallography 

Methods of Use Gene Therapy 

40 

According to the present invention, a method is also provided of supplying wild-type BRCAl function to a cell which 
carries mutant BRCAl alleles Supplying such a function should suppress neoplastic growth of the recipient cells The 
wild-type BRCAl gene or a part of the gene may be introduced into the cell in a vector such that the gene remains 
extrachromosomal In such a situation, the gene will be expressed by the cell from the extrachromosomal location If a 
45 gene fragment is introduced and expressed in a cell carrying a mutant BRCAl allele, the gene fragment should encode 
a part of the BRCAl protein which is required for non-neoplastic growth of the cell More preferred is the situation where 
the wild-type BRCAl gene or a part thereof is introduced into the mutant cell in such a way that it recombines with the 
endogenous mutant BRCAl gene present in the cell. Such recombination requires a double recombination event which 
results in the correction of the BRCAl gene mutation Vectors for introduction of genes both for recombination and for 
so extrachromosomal maintenance are known in the art and any suitable vector may be used Methods for introducing 
DNA into cells such as electroporation calcium phosphate co-precipitation and viral transduction are known in the art 
and the choice of method is within the competence of the routineer Cells transformed with the wild-type BRCAl gene 
can be used as model systems to study cancer remission and drug treatments which promote such remission 

As generally discussed above, the BRCAl gene or fragment, where applicable, may be employed in gene therapy 
55 methods in order to increase the amount of the expression products of such genes in cancer cells Such gene therapy 
is particularly appropriate for use in both cancerous and pre-cancerous cells, in which the level of BRCAl polypeptide 
is absent or diminished compared to normal cells. It may also be useful to increase the level of expression of a given 
BRCAl gene even in those tumor cells in which the mutant gene is expressed at a “normal" level but the gene product 
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Gene therapy would be earned out according to generally accepted methods for example as described oy Med- 
man 1 991 Colls Irom a patient's lumor would be tirst analyzed by the diagnostic metneos doscr bed above to ascerta.n 
tne production of BRCAi polypeptide m the tumor ceils A virus or plasmid vector 'see further details below, contamng 
a ,-oov of ’he BRCAf gene linked to expression control elements and capable of replicating inside the tumor coils is 
prepared Suitable vectors are known such as disclosed ,n U S Paten. 5 252 479 and PCT published application WO 
93/072 E 2 The vector is then injected into the patient either locally at the site of the tumor or systemically ,m -rde, to 
reach any tumor ceils tnat may have metastasized to other sites) if the transfected gene is not permanently incorporated 
into the genome of each ct the targeted tumor cells, the treatment may have to be repeated periodically 

Gene transfer systems known in the art may be useful in the practice of the qene therapy methods of the present 
invention These include viral and nonviral transfer methods A number of viruses have been used as gene transfer 
vectors including papovav, ruses eg SV40 (Madzak ef at. 1992) adenovirus (Berkner 1992 Berkner el al 1988 
Gorziglia and Kapikian 1 992 Quant, n et al 1 992 Rosenfeld el al 1 992: Wilkinson et al 1 992 Stratford-Perncaudet 
el al 1990) vaccinia virus (Moss. 1992). adeno-associated virus (Muzyczka. 1992 Ohi el al . 1990) herpesviruses 
including HSV and EBV tMargolskee 1992: Johnson etai. 1992 Fink et al 1992: Breakfield and Geller 1987 Freese 
et al 1990), and retroviruses of avian (Brandyopadhyay and Temm. 19s4 Petropoulos et al.. 1 ), murine i er. 

1992 Miller era/. 1985 Sorge et al. 1984 Mann and Baltimore. 1985 Miller eta/. 1988) and human origin (Shimada 
etai. 1991. Helseth eta/.. 1990: Page etai. 1990: Buchschacher and Pangamban. 1992) Most human gene therapy 

protocols have been based on disabled murine retroviruses 

Nonviral gene transfer methods known in the arl include chemical techniques such as calcium phosphate copre- 
cipitation (Graham and van der Eb, 1973. Pellicer et al 1980). mechanical techniques, for example microinjection 
(Anderson eta/ 1980. Gordon etai. 1980 Brms.er et a/ . 1981 Constantin, and Lacy, 1981) membrane fusion-me- 
diated transfer via liposomes (Feigner et al . 1967 Wang and Huang. 1989 Kaneda et al 1989 Stewart et al 992. 
Nabel et al 1 990 Lim et al 1 992): and direct DN A uptake and receptor-mediated DNA transfer (Wolff et al .1990 
Wu eta/ 1991 Zenk e et al . 1990; Wu etai. 1969b; Wolff et at. 1991. Wagner et at. 1990 Wagner etai 1991. 
Cotten et al 1 990 Cunel et at ,1991a: Curie) et at 1991b) Viral-mediated gene transfer can be combined with direct 
,n vivo gene transfer using liposome delivery, allowing one to direct the viral vectors to the tumor cells and not into the 
surrounding nondividing cells Alternatively, the retroviral vector producer cell line can be injected into tumors (Culver 
et at. 1 992) Injection of producer cells would then provide a continuous source of vector particles. This technique has 
been approved for use in humans with inoperable brain tumors. 

in an approach which combines biological and physical gene transfer methods, plasmid DNA of any size is combined 
with a polylysine-conjugated antibody specific to the adenovirus hexon protein and the resulting complex is bound to 
an adenovirus vector The trimolecular complex is then used to infect cells. The adenovirus vector permits efficient 
binding internalization, and degradation of the endosome before the coupled DNA is damaged 

Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene transfer While in stand- 
ard liposome preparations the gene transfer process is nonspecific, localized in vivo uptake and expression have been 
reported in tumor deposits, for example, following direct in situ administration (Nabel 1992) 

Gene transfer techniques which target DNA directly to breast and ovarian tissues e g;, epithelial cells of breast 
or ovaries, is preferred Receptor-mediated gene transfer, for example, is accomplished by the conjugation of DNA 
(usually in the form of covalently closed supercoiled plasmid) to a protein ligand via polylysine Ligands are chosen on 
the basis of the presence of the corresponding ligand receptors on the cell surface of the target cell/tissue type One 
appropriate receptor/ligand pair may include the estrogen receptor and its ligand, estrogen (and estrogen analogues). 
These ligand-DNA conjugates can be injected directly into the blood if desired and are directed to the target tissue where 
receptor binding and internalization of the DNA-protem complex occurs To overcome the problem of intracellular de- 
struction of DNA, coinfection with adenovirus can be included to disrupt endosome function 

The therapy involves two steps which can be performed singly or jointly. In the first step, prepubescent females who 
carry a BRCAI susceptibility allele are treated with a gene delivery vehicle such that some or all of their mammary ductal 
epithelial precursor cells receive at least one additional copy of a functional normal BRCAI allele In this step, the treated 
individuals have reduced risk of breast cancer to the extent that the effect of the susceptible allele has been countered 
by the presence of the normal allele In the second step of a preventive therapy, predisposed young females, in particu ar 
women who have received the proposed gene therapeutic treatment undergo hormonal therapy to mimic the effects 
on the breast of a full term pregnancy 


55 


Methods of Use Peptide Therapy 

Peptides which have BRCAI activity can be supplied to cells which carry mutant or missing BRCAI alleles The 
sequence of the BRCAI protein ,s disclosed (SEQ ID NO 2). Protein can be produced by expression of the cDNA 
sequence in bacteria for example using known expression vectors Alternatively BRCAI polypeptide can be extrac e 
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<r C m BRCAi -Dfoduc;ng mammalian coils In addition the techniques of synthetic chemistry can be employed to syn 
thesize BRCAI protein Any of such techniques can provide the preparation of the present invention which comprises 
the BRCAi protein The preparation -s substantially tree of other human proteins This is most readily accompnsned by 

synthesis >n a microorganism or in vitro . 

Active BRCAi molecules can bo introduced into cells by micromiection or by use of liposomes for example Alter- 
natively some active molecules may oe taken up by cells actively or by diffusion Extracellular application of the BRCAI 
gene product may bo sufficient to affect tumor growth Supply of molecules with BRCAI activity should lead to partial 
reversal of the neoplastic state Other molecules with BRCAI activity (tor example peptides drugs or organic com- 
pounds) may aiso oe used to effect such a reversal Modified polypeptides having substantially similar function are also 

used for peptide therapy 
Methods of Use. Transformed Hosts 

Similarly cells and animals which carry a mutant BRCAI allele can be used as mocr -ystems to study and test for 
substances which have potential as therapeutic agents The cells are typically culturec -Thetaal cells These may be 
isolated from individuals with BRCAI mutations, either somatic or germline Alternatively, the cell line can be engineered 
to carry the mutation in the BRCAI allele, as described above After a test substance is applied to the cells, the neo- 
plastically transformed phenotype of the cell is determined Any trait of neoplastically transformed cells can be assesse , 
including anchorage-independent growth, tumorigemcity in nude mice invasiveness of cells, and growth factor depend- 

ence Assays for each of these traits are Known in the art 

Animals for testing therapeutic agents can be selected after mutagenesis of whole animals or after treatment of 
germlme cells or zygotes Such treatments include insertion of mutant BRCAI alleles usually from a second amma 
species as well as insertion of disrupted homologous genes Alternatively, the endogenous BRCAI gene(s) of the 
animals may be disrupted by insertion or deletion mutation or other genetic alterations using conventional techniques 
(Capecchi. 1989: Valanciusand Smithies, 1991. Hasty etal.. 1991; Shmkai eta/.. 1992. Mombaertsef a/.. 1992 Philpott 
el al 1992 Snouwaert et al 1992: Donehower et a/ , 1992) After test substances have been administered to the 
animals the growth of tumors must be assessed. If the test substance prevents or suppresses the grovwh of tumors 
then the test substance is a candidate therapeutic agent for the treatment of the cancers identified herein These animal 
models provide an extremely important testing vehicle for potential therapeutic products. 

The present invention is described by reference to the following Examples, which are offered by way of illustration 
and are not intended to limit the invention in any manner. Standard techniques we -own in the art or the techniques 
specifically described below were utilized 

EXAMPLE 1 

Ascertain and Study Kindreds Likely to Have a I7q - Lmked Breast Cancer Susceptibility Locus 

Extensive cancer prone kindreds were ascertained from a defined population providing a large set of extended 
kindreds with multiple cases of breast cancer and many relatives available to study. The large n umber of meioses present 
in these large kindreds provided the power to detect whether the BRCAI locus was segregating, and increased he 
opportunity for informative recombinants to occur within the small region being investigated. This vastly improved the 
chances of establishing linkage to the BRCAI region, and greatly facilitated the reduction of the BRCAI region to a 

manaqeable size, which permits identification of the BRCAI locus itself 

Each kindred was extended through all available connecting relatives, and to all informative first degree re atrves 
of each proband or cancer case. For these kindreds, additional breast cancer cases and individuals with cancer at other 
sites of interest (e g ovarian) who also appeared in the kindreds were identified through the tumor registry linked hies. 
All breast cancers reported in the kindred which were not confirmed in the Utah Cancer Registry were researched 
Medical records or death certificates were obtained for confirmation of all cancers Each key connecting individual and 

all informative individuals were invited to participate by providing a blood sample from which DIMA was extracted We 

, . , tk a i thp of tho deceased cases could be interred 

also sampled spouses ana relatives of deceased ca&e* s^ mai ins __ 

from the genotypes of their relatives , . . 

Ten kindreds which had three or more cancer cases with inferable genotypes were selected for linkage studies to 
1 7q markers from a set ol 29 kindreds originally ascertained from the linked databases for a study of proliferative breas 
disease and breast cancer (Skolmck et al 1990) The criterion for selection of these kindreds was the presence o two 
sisters or a mother and her daughter with breast cancer Additionally, two kindreds which have been studied since 980 
as part of our breast cancer linkage studies (K1001 K9018), six kindreds ascertained from the linked databases for the 
presence of clusters of breast and/or ovarian cancer (K2019 K2073. K2079. K2080 K2039. K2082) and a sert-referred 
kindred with early onset breast cancer (K2035) were included These kindreds were investigated and expanded in our 
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cirne r the manner described above Table 1 displays 

subsequent examples In Tabie l for each Kindred ne - . Drcas!/0 vanan cancer are reported Kindreds 

7ZZ ZZZZ 01 breast cancer Four women d.agnosed w„h both ovarian 
anri hrnaM cancer aro courted m Doth categories 
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EXAMPLE 2 


Selection of Kindreds Wmch are Lmked to Chromosome I7q and Localization of 3RCA1 to the interval Vital 5 - 
M'cT 66 
5 

For each sample collected in these 19 kindreds DNA was extracted from blood (or in two cases from paraffin-em- 
bedded tissue blocks; using standard laboratory protocols Genotypmg m this study was restricted to short tandem 
repeat iSTR) markers since, in general they have high heterozygosity ana PCR methods offer rapid turnaround whne 
using very small amounts of DNA To aid in this effort, four such STR markers on chromosome 17 were developed by 
1 o screening a chromosome specific cosmid library for CA positive clones Three of these markers localized to the long 
arm (46E6, Easton era/, 1993) (42D6 Easton et al 1993) 26C2(D17S514 Oliphant eta/.. 1991) while the other 
12G6 (D17S513. Oliphant et al .. 1991) localized to the short arm near the p53 tumor suppressor locus Two of these. 
42D6 and 46E6. were submitted to the Breast Cancer Linkage Consortium for typing of breast cancer families by inves- 
tigators worldwide Oligonucleotide seguences for markers not developed in our laboratory were obtained from published 
>5 reports, or as part of the Breast Cancer Linkage Consortium, or from other investigators All genotypmg films were scored 

blindly with a standard lane marker used to maintain consistent coding of alleles Key samples in the four kindreds 
presented here underwent duplicate typing for all relevant markers. All 1 9 kindreds have been typed for two polymorphic 
C A repeat markers 42D6 (D1 7S588), a CA repeat isolated in our laboratory and Mfdl 5 (D1 7S250), a CA repeat provided 
by J Weber (Weber et al 1 990) Several sources of probes were used to create genetic markers on chromosome 17. 
20 specifically chromosome 17 cosmid and lambda phage libraries created from sorted chromosomes by the Los Alamos 
National Laboratories (van Dilla et al 1986). 

LOD scores for each kindred with these two markers (42D6, Mfdl 5) and a third marker, Mfdl 88 (D17S579, Hall et 
al 1992) located roughly midway between these two markers, were calculated for two values of the recombination 
fraction, 0 001 and 0 1 (For calculation of LOD scores, see Oh, 1985) Likelihoods were computed under the model 
25 derived by Claus et al. 1 991 . which assumes an estimated gene frequency of 0.003, a lifetime risk in gene carriers of 

about 0.80, and population based age-specific risks for breast cancer in non-gene carriers Allele frequencies for the 
three markers used for the LOD score calculations were calculated from our own laboratory typings of unrelated indi- 
viduals m the CEPH panel (White and Lalouel 1 988) Table 2 shows the results of the pairwise linkage analysis of each 
kindred with the three markers 42D6 Mfdl 88 and Mfdl 5. 
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Using a criterion for linkage to 17q of a LOD score > 1 0 for at least one locus under the CASH model (Claus et ai, 
1991). four of the 19 kindreds appeared to be linked to 17q (K1901 . K1925, K2035 K2082) A number of additional 
kindreds showed some evidence of linkage but at this time could not be definitively assigned to the linked category 
These included kindreds K1911 K2073, K2039, and K2080 Three of the 17q-!inked kindreds had informative recom- 
binants in this region and these are detailed below 
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Kindred 2C52 s the 'argest 1 7q-linkcd breast career family reported to date by ary group The kindred contains 20 
cases of breast cancer and ten cases of ovarian cancer Two cases nave both ovarian ana breast cancer The eviaence 
of linkage to 1 7q for this family is overwhelming the lCD score with tne i inked haplctype iSover6 0 despite the existence 
of three cases of breast cancer which appear to be sporadic : e these cases share no oart of the linked haplotype 
5 between Mfat 5 and 42D6 These three sporadic cases were diagnosed with breast cancer at ages 46 47 and 54 In 
smaller kindreds sporadic cancers of this type greatly confound the analysis of linkage and the correct identification of 
Key recombinants The key recombinant in the 2062 kindred is a woman who developed ovarian cancer at age 45 whose 
mother ana aunt had ovarian cancer at ages 58 and 66 respectively She inherited the linked portion of the naplotype 
for both Vffdiaa and 42D6 while inheriting unlinked alleles at Mfdt5 this recombinant event placed BPCAl distal to 
io Midi 5 K1901 is typical of early-onset breast cancer kindreds The kindred contains 10 cases of breast cancer with a 
median age at diagnosis of 43 5 years of age four cases were diagnosed under age 40 

The LOD score for this kindred with the marker 42D6 is 1 5 resulting in a posterior probability of 17q-lmkage of 
0 96 Examination of haplotypes in this kindred identified a recombinant haplotype in an obligate male carrier and his 
affected daughter who was diagnosed with breast cancer at age 45 Their linked allele for marker Mfdi5 differs from 
is that found in all other cases in the kindred (except one case which could not be completely inferred from her children) 
The two haplotypes are identical for Mfdt 88 and 42D6 Accordingly data from Kindred 1 901 would also place the BRCA1 
locus distal to Mfdl 5 

Kindred 2035 is similar to K1901 in disease phenotype The median age of diagnosis for the eight cases of breast 
cancer in this kindred is 37 One case also had ovarian cancer at age 60 The breast cancer cases in this family descend 
20 from two sisters who were both unaffected with breast cancer until their death in the eighth decade Each branch contains 
four cases of breast cancer with at least one case in each branch having markedly early onset This kindred has a LOD 
score of 2 34 with Mfdl 5. The haplotypes segregating with breast cancer in the two branches share an identical allele 
at Midi 5 but differ for the distal loci Mfdl 88 and NM23 (a marker typed as part of the consortium which is located just 
distal to 42D6 (Hall et al . 1 992)). Although the two haplotypes are concordant for marker 42D6. it is likely that the alleles 
25 are shared identical by state (the same aitele but derived from different ancestors), rather than identical by descent 
(derived from a common ancestor) since the shared allele is the second most common allele observed at this locus By 
contrast the linked allele shared at Mfdl 5 has a frequency of 0.04 This is a key recombinant in our dataset as it is the 
sole recombinant in which BRCA1 segregated with the proximal portion of the haplotype, thus setting the distal boundary 
to the BRCA1 region For this event not to be a key recombinant requires that a second mutant BRCA1 gene be present 
30 m a spouse marrying into the kindred who also shares the rare Mfdl 5 allele segregating with breast cancer in both 
branches of the kindred This event has a probability of less than one in a thousand The evidence from this kindred 
therefore placed the BRCA1 locus proximal to Mfdl 88 

EXAMPLE 3 

35 

Creation of a Fine Structure Map and Refinement of the BRCA1 Region to Mfdl 91-Mfdl88 using Additional STR 
Polymorphisms 

In order to improve the characterization of our recombinants and define closer flanking markers, a dense map of 
40 this relatively small region on chromosome 1 7q was required. The chromosome 1 7 workshop has produced a consensus 
map of this region (Figure 1) based on a combination of genetic and physical mapping studies (Fain, 1992). This map 
contains both highly polymorphic STR polymorphisms, and a number of nonpolymorphic expressed genes Because 
this map did not give details on the evidence for this order nor give any measure of local support for inversions in the 
order of adjacent loci, we viewed it as a rough guide for obtaining resources to be used for the development of new 
45 markers and construction of our own detailed genetic and physical map of a small region containing BRCA1 Our ap- 
proach was to analyze existing STR markers provided by other investigators and any newly developed markers from 
our laboratory with respect to both a panel of meiotic (genetic) breakpoints identified using DNA from the CEPH reference 
families and a panel of somatic cell hybrids (physical breakpoints) constructed for this region. These markers included 
26C2 developed in our laboratory which maps proximal to Mfdl 5, Mfdl 91 (provided by James Weber), THRA1 (Futreal 
so et a! 1992a), and three polymorphisms kindly provided to us by Dr Donald Black. NM23 (Hall et al. 1992), SCG40 
(D1 7S181 ) and 6C1 (D17S293) 

Genetic localization of markers 

55 In order to localize new markers genetically within the region of interest, we have identified a number of key meiotic 

breakpoints within the region, both in the CEPH reference pane! and in our large breast cancer kindred (K2082) Given 
the small genetic distance in this region, they are likely to be only a relatively small set of recombinants which can be 
used for this purpose, and they are likely to group markers into sets The orders of the markers within each set can only 
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bo determined oy phys cal mapping However ihe number of genotypmgs necessary to position a new marker is mini- 
mized These breakpoints are illustrated in Tabies 3 and 4 Using this approach we were aoie to genetically order the 
markers 7HRA1 6C1 SCG40. and Mfdl 91 As can be seen from Tables 3 and 4 THRA1 and MFDt 91 DCin map inside 
the Mfdl 5-Mfdl 6S region we had previously identified as containing the BRCA1 locus In Tables 3 and 4 M/P indicates 
5 a maternal or paternal recombinant A "T indicates inherited allele is of grandpaternai origin while a "O' indicates grand- 
maternal origin and ' 1 rdicates that the locus was untyped or uninformative 
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Analysis of markers Midi 5 Midi == Midi 9* and T HRA1 m our 'oconbirant families 

Midis Midi 58 Midi 9 1 and THRA1 were tyood in our recombinant families and examined for additional ntcrmaton 

to localize the BRCAl locus In kindred 1901 the Mfdl 5 recombinant was recombinant for THRA1 but uninformative 
5 for M'd 1 9i thus placing BRCAl distal to THRA1 In K2062 the recombinant with M!dl5 also was recombinant with 
Mfdl 91 thus piacng the BRCAl locus distal to Midi 91 (Goidgar et al 1994) Examination of THRA1 andMfd19i m 
kindred K2035 yielded no funner localization information as the two branches were concordant for both markers How- 
ever 5CG40 and 6C1 both displayed Ihe same pattern as Mfdi88 thus increasing our confidence in the localization 
information provided by the Mfd188 recombinant m this family The BRCAl locus or at least a portion of it. therefore 
w Mas within an interval bounded by Mfdl 91 on the proximal side and Mfdl 66 on the distal side 


EXAMPLE 4 


20 


25 


30 


Development of Genetic and Physical Resources in the Roqion of Interest 

To increase the number of highly polymorphic loci in the Mfdl 91 -Mfdl 66 region, we developed a number of STR 
markers in our laboratory from cosmids and YACs which physically map to the region These markers allowed us to 
further refine the region. STSs were identified from genes known to be in the desired region to identify YACs w ic 
contained these loci, which were then used to identify subclones in cosmids. PI s or BACs These subclones were then 
screened for the presence of a CA tandem repeat using a (CA)„ oligonucleotide (Pharmacia) Clones with a strong 
siqnal were selected preferentially, since they were more likely to represent CA-repeats which have a large number o 
repeats and/or are of near-perlect fidelity to the (CA) n pattern Both of these characteristics are known to increase the 
probability of polymorphism (Weber, 1990). These clones were sequenced directly from the vector to locate ihe repeat 
We obtained a unique sequence on one side of the CA-repeat by using one of a set of possible primers complementary 
to the end of a CA-repeat. such as (GT), 0 T. Based on this unique sequence, a primer was made to sequence back 
across the repeat in the other direction, yielding a unique sequence for design of a second primer flanking the CA-repeah 
ctr s were then screened for polymorphism on a small group of unrelated individuals and tested against the hybrid 
oanel to confirm their physical localization New markers which satisfied these criteria were then typed in a set of 40 
unrelated individuals from the Utah and CEPH families to obtain allele frequencies appropriate for the study population. 
Many of the other markers reported in this study were tested in a smaller group of CEPH unrelated individuals to obtain 

similarly appropriate allele frequencies. , .. „, rc 

Usinq the procedure described above a total of eight polymorphic STRs was found from these YACS. Of the oci 
identified in this manner, four were both polymorphic and localized to the BRCAl region. Four markers did not localize 
to chromosome 17. reflecting the chimenc nature of the YACs used The four markers which were ,n the region were 
denoted AA1 ED2, 4-7, and YM29 AA1 and ED2 were developed from YACs positive for Ihe RNU2 gene. 4-7 from an 
EPB3 YAC and YM29 from a cosmid which localized to the region by the hybrid panel A description of the number of 
alleles, heterozygosity and source of these four and all other STR polymorphisms analyzed in the breast cancer kindreds 
is given below in Table 5 
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The four STR polymorphisms which mapped physically to the region (4-7, ED2 AA , « me 

meiotlc br eakpo,nt panel shown initially in Tables 3 and 4 Tables 6 and 7 contain the relevant CEPH data and Kindred 
2082 data for localization of these four markers In the tables M/P indicates a maternal or paternal reomb 
indicates inherited allele is of grandpa, ernal origin, while a -O' indicates grandmaternal origin, and ■ indicates 
locus was untyped or uninformative 
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From CEPH 1333-04. we see .ha. AA1 and YM29 must he distal to M.d.91 From 13292 it can be interred that both 
AA1 and ED2 are proximal .0 4-7, YM29, and MfdlSB The recombinants found ,n K2 0e 2 prov^e some a ^.«nal 

ordermq information Three independent observations (individual numbers 22 40, & 63) place AA1ED2. _ 

anc^MIdlMdisial lo Mfdl91 «h,iel0 ,25piac«4., ,M29 add *>38 p,»*,mal ,o SCO40 

dIle,el.,i»eb,d,,,ng«,m,n,h,,woc,as,,r.d.m,,,,,sAA,3^^ 

mat analysis Although ordering «' "W '•»“ » "»“«* " h " a ™ hoown Mye ™h 

pieces ol interstitial huinan DNA may Pe missing is problematic tne hybrid patterns indicate that 4 7 lies 
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VM29 anc MfdlBB 
EXAMPLE 5 

5 Genetic Analyses cf Breast Cancer Kindreds with Markers AA1 4-7 £D2 : and YM2g 

In addition to the three kindreds containing key recombinants which have been discussed previously kindred K2039 
was shown through analysis of the newly developed STR markers to be ImKed to the region and to contain a useful 
recombinant 

io Table 3 defines the haplotypes (shown in coded form) of the kindreds in terms of specific marker alleles at each 

locus and their respective frequencies In Table S alleles are listed m descending order of frequency frequencies of 
alleles 1 -5 for each locus are given in Table 5 Haplotypes coded H are BRCA1 associated haplotypes P designates a 
partial H haplotype and an R indicates an observable recombinant haplotype As evident in Table 8 not all kindreds 
were typed for all markers, moreover not ail individuals within a kindred were typed for an identical set of markers. 

i5 especially in K2032 With one exception, only haplotypes inherited from affected or at-risk kindred members are shown, 
haplotypes from spouses marrying into the kindred are not described. Thus in a given sibship the appearance of hap- 
lotypes X and Y indicates that both haplotypes from the affected/at-risk individual were seen and neither was a breast 
cancer associated haplotype 
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In kindred K1 901 the new markers showed no observable recombination with breast cancer susceptibility, indicating 

«« « « MAM mo. hkety ,ook pl.c, THH», -O EM 

localization information was obtained based upon studying the tour new markers in this kindred In kindred 2082 me 
key recombinant individual has mhentedthe linked alleles for ED2. 4-7 AA1 and YM29, and was recombinant for td|1 474 

mdicatinq that the recombination event occurred in this individual between td| 1474 and ED2/AA 

“hern are three haplotypes o. interest ,n kindred K2035 HI H2 and R2 shown in Table 8 HI ,s present in the tour 
cases and one obligate male carrier descendant from individual 17 while H2 is present or inferred m two .cases . and two 
obligate male earners in descendants of individual 10 R2 is identical to H2 tor loci between and including Midi 5 and 
SCG40 but has recombined between SCG40 and 42D6 Since we have established that BRCA! is proximal 
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•Ys H2.R2 qiFerence adds no furtner localization information Hi and R2 share an denticai a.lolo at Midi 5 > n PAi 
AAt and ED2 but differ ?or loci presumed distal to ED2 i e 4-7 Mfd'68 SCG40 andSCI Although the two haplotypes 
are concordant for the 5th aile'e for marker YM29 a marker whicn maps physically between 4-7 and Mfdl sc t is nke y 
mat f he alleges are shared identical by state rather than .denticai by descent since this allele is the most common alle e 
5 at this locus with a frequency est, mated in CEPH parents of 0 42 By contrast the linked aileies shared at the Midi 5 
and ED2 loci have frequencies of 0 04 and 0 09 respectively They also share more common alleles at Mfdi9i (fre- 
quency -0 52) THRAt and AA1 (frequency = 0 25} This <s the key recombinant in the set as it is the sole recombinant 
m which breast cancer segregated with the proximal portion of the haplotype. thus setting the distal boundary The 
evidence from this kindred therefore places the BRCA1 locus proximai to 4-7 
io The recombination event ;n kindred 2062 which places BRCA1 distal to tdj1474 is the only one of the four events 

described which can be directly inferred, that is, the affected mother’s genotype can be inferred from her spouse and 
offspring, and the recombinant haplotype can be seen in her affected daughter In this family the odds in favor of affected 
individuals carrying BRCA1 susceptibility alleles are extremely high the only possible interpretations of the data are 
that BRCA1 is distal to Mid 191 or alternatively that the purported recombinant is a sporadic case of ovarian cancer at 
is age 44 Rather than a directly observable or inferred recombinant, interpretation of kindred 2035 depends on the ob- 

servation of distinct 17q-haplotypes segregating in different and sometimes distantly related branches of the kindred. 
The observation that portions of these haplotypes have alleles in common for some markers while they differ at other 
markers places the 3RCA1 locus in the shared region The confidence in this placement depends on several factors 
the relationship between the individuals carrying the respective haplotypes. the frequency of the shared allele, the cer- 
20 tamty with which the haplotypes can be shown to segregate with the BRCA1 locus, and the density of the markers in 

the region which define the haplotype In the case of kindred 2035 the two branches are closely related and each 
branch has a number of early onset cases which carry the respective haplotype While two of the shared alleles are 
common. (Mfdl 91 , THRA1 ) : the estimated frequencies of the shared alleles at Mfdl 5, AA1 , and ED2 are 0 04. 0 28, 
and 0 09 respectively It is therefore highly likely that these alleles are identical by descent (derived from a common 
25 ancestor) rather than identical by state (the same allele but derived from the general population) 

EXAMPLE 6 

Refined Physical Mapping Studies Place the BRCA1 Gene in a Region Flanked bv tdjl 474 and U5R 
30 

Since its initial localization to chromosome 1 7q in 1 990 (Hall etaf., 1 990) a great deal of effort has gone into localizing 
the BRCA1 gene to a region small enough to allow implementation of effective positional cloning strategies to isolate 
the gene The BRCA1 locus was first localized to the interval Mfdl 5 (D17S250) - 42D6 (D17S588)by multipoint linkage 
analysis (Easton et al 1 993) in the collaborative Breast Cancer Linkage Consortium dataset consisting of 21 4 families 
35 collected worldwide. Subsequent refinements of the localization have been based upon individual recombinant events 
in specific families The region THRA1 - D17S163 was defined by Bowcock et al . , 1 993 and the region THRA1 - D 1 7S78 
was defined by Simard et al., 1 993 

We further showed that the BRCA1 locus must lie distal to the marker Mfdl 91 (D17S776) (Goldgar et al. 1994) 
This marker is known to lie distal to THRA1 and RARA The smallest published region for the BRCA1 locus is thus 
40 between D17S776 and D17S78. This region still contains approximately 15 million bases of DNA. making the isolation 
and testing of all genes in the region a very difficult task. We have therefore undertaken the tasks of constructing a 
physical map of the region, isolating a set of polymorphic STR markers located in the region, and analyzing these new 
markers in a set of informative families to refine the location of the BRCA1 gene to a manageable interval 

Four families provide important genetic evidence for localization of BRCAl to a sufficiently small region for the 
■*5 application of positional cloning strategies Two families (K2082, K1901) provide data relating to the proximal boundary 
for BRCAl and the other two (K2035. K181 3) fix the distal boundary. These families are discussed in detail below A 
total of 1 5 Short Tandem Repeat markers assayable by PCR were used to refine this localization in the families studied 
These markers include DS17S7654, DS1 7S975. tdjl 474, and tdjl 239. Primer sequences for these markers are provided 
in SEQ ID NO 3 and SEQ ID NO 4 for DS17S754. in SEQ ID NO 5 and SEQ ID NO 6 for DS17S975; in SEQ ID NO 7 
so and SEQ ID NO 8 for tdj1474 and, in SEQ ID NO 9 and SEQ ID NO10 for tdjl 239 

Kindred 2082 

Kindred 2082 is the largest BRCAl -linked breast/ovarian cancer family studied to date It has a LOD score of 8 6, 
55 providing unequivocal evidence for 17q linkage This family has been previously described and shown to contain a 
critical recombinant placing BRCAl distal to MFD191 (D17S776) This recombinant occurred in a woman diagnosed 
with ovarian cancer at age 45 whose mother had ovarian cancer at age 63 The affected mother was deceased however, 
from her children she could be inferred to have the linked haplotype present in the 30 other linked cases in the family 
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n tn c region between Midi 5 ana Midi 90 Her affected daughter received the iinxed ailete at the loci ED 2 4-7 ana 
MfdlBS but received the allele cn the non-BRCAl chromosome at Mfdl 5 and Mfdl9t In order to f urther localize this 
recombination breakpoint, we tested the key members of this family for the following markers derived from physical 
mapomg resources tdj1474 td] 1 239 CF4 D17S955 For the markers tdjl 474 and CF4 the affected daughter aid not 
5 receive the linked allele For the STR locus tdj 1 239 however the mother could be inferred to be informative and her 
daugnter did receive the BRCAl -associated allele Di 7S855 was not informative in this family Based on this analysis 
the order is 1 7q centromere - Mfdl 91 - 1 7HSD - CF4 - tdj 1 474 - tdjl 239 - Di 7 Sc 55 - ED2 - 4-7 - Mfdl 98 - I7q telomere 
The recombinant described above therefore places BRCAl distal to tdjl 474 and the breakpoint is localized to the 
interval between tdjl 474 and tdjl 239 The Hy alternative explanation for the data in this family other than that of BRCAl 
?o being located distal to tdjl 474. is that the ovarian cancer present in the recombinant individual <s caused by reasons 

independent of the BRCAf gene Given that ovarian cancer diagnosed before age 50 is rare this alternate explanation 
is exceedingly unlikely 

Kindred 1901 
15 

Kindred 1901 is an early-onset breast cancer family with 7 cases of breast cancer diagnosed before 50. 4 of which 
were diagnosed before age 40. In addition, there were three cases of breast cancer diagnosed between the ages of 50 
and 70 One case of breast cancer also had ovarian cancer at age 61 This family currently has a LOD score of 1 5 with 
Di 7S855 Given this linkage evidence and the presence of at lease one ovarian cancer case, this family has a posterior 
20 probability of being due to BRCAl of over 0 99 In this family, the recombination comes from the fact that an individual 

who is the brother of the ovarian cancer case from which the majority of the other cases descend, only shares a portion 
of the haplotype which is cosegregating with the other cases in the family However, he passed this partial haplotype to 
his daughter who developed breast cancer at age 44 If this case is due to the BRCAl gene then only the part of the 
haplotype shared between this brother and his sister can contain the BRCAl gene The difficulty in interpretation of this 
25 kind of information is that while one can be sure of the markers which are not shared and therefore recombinant, markers 

which are concordant can either be shared because they are non-recombinant or because their parent was ho- 
mozygous Without the parental genotypic data it is impossible to discriminate between these alternatives Inspection 
of the haplotype in K1 901 . shows that he does not share the linked allele at Mfdl 5 (DI 7S250), THRA1 CF4 (DI 7S1 320). 
and tdjl 474 (17DS1321 ). Hedoes sharethe linked allele at Mfdl 91 (D17S776). ED2 (D17S1327). tdj 1 239 (D17S1328), 
30 and Mfd188 (DI 7S579) Although the allele shared at Mfdl 91 is relatively rare (0.07), we would presume that the parent 

was homozygous since they are recombinant with markers located nearby on either side, and a double recombination 
event in this region would be extremely unlikely Thus the evidence in this family would also place the BRCAl locus 
distal to tdjl 474 However, the lower limit of this breakpoint is impossible to determine without parental genotype infor- 
mation It is intriguing that the key recombinant breakpoint in this family confirms the result in Kindred 2062 As before, 
35 the localization information in this family is only meaningful if the breast cancer was due to the BRCAl gene. However, 

her relatively early age at diagnosis (44) makes this seem very likely since the risk of breast cancer before age 45 in 
the general population is low (approximately 1%) 

Kindred 2035 

40 

This family is similar to K1 901 in that the information on the critical recombinant events is not directly observed but 
is inferred from the observation that the two haplotypes which are cosegregating with the early onset breast cancer in 
the two branches of the family appear identical for markers located in the proximal portion of the 1 7q BRCAl region but 
differ at more distal loci Each of these two haplotypes occurs in at least four cases of early-onset or bilateral breast 
45 cancer The overall LOD score with ED2 in this family is 2 2, and considering that there is a case of ovarian cancer in 

the family (indicating a prior probability of BRCAl linkage of 80%). the resulting posterior probability that this family is 
linked to BRCAl is 0 998 The haplotypes are identical for the markers Mfdl 5, THRA1 Mfdl 91. ED2, AA1, D17S858 
and DI 7S902. The common allele at Mfdl 5 and ED2 are both quite rare, indicating that this haplotype is shared identical 
by descent The haplotypes are discordant, however for CA375. 4-7, and Mfdl 88 and several more distal markers 
so This indicates that the BRCAl locus must lie above the marker CA-375 This marker is located approximately 50 kb 
below D17S78, so it serves primarily as additional confirmation of this previous lower boundary as reported in Simard 
etal. (1993). 

Kindred 181 3 
55 

Kindred 1813 is a small family with four cases of breast cancer diagnosed under the age of 40 whose mother had 
breast cancer diagnosed at age 45 and ovarian cancer at age 61. This situation is somewhat complicated by the fact 
the four cases appear to have three different fathers, only one of whom has been genotyped However by typing a 
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number cf different ma r kers n the BRCAl region as well as highly polymorphic markers elsewhere m the genome the 
paternity of ail children in the family has been aeterrmned with a high degree of certainty This family yields a maximum 
multipoint LOD score of 0 50 with 1 7q markers and given that there is at least one case of ovarian cancer results -n a 
posterior probability of being a BRCAt linked family of 0 93 This family contains a directly observable recombination 
5 event in individual IS (see Figure 5 in Simard ct at. Human Mol Genet 2 H 93- 1199 (1993)) who developed breast 
cancer at age 34 The genotype of her affected mother at the relevant 1 7q loci can be inferred from her genotypes her 
affected sister s genotypes and the genotypes of three other unaffected siblings Individual 1 o inherits the BRCAl -linked 
alleles for the following loci Mfd15 THRA1 D17S800 D17SS55. AA1 and D17S931 However for markers below 
D17S931 ie U5R vrs3l D17SS58 and D17S579. she has inherited the alleles located on the non-disease bearing 

io chromosome The evidence from this family therefore would place the BRCAl locus proximai to the marker U5R Be- 

cause of her early age at diagnosis (34) it is extremely unlikely that the recombinant individual s cancer is not due to the 
gene responsible for the other cases of breast/ovarian cancer in this family the uncertainty in this family comes from 
our somewhat smaller amount of evidence that breast cancer in this family is due to BRCAl rather than a second, as 
yet unmapped, breast cancer susceptibility iocus 
is 

Size of the region containing BRCAl 

Based on the genetic data described in detail above, the BRCAl Iocus must lie in the interval between the markers 
tdj 1 474 and U5R both of which were isolated in our laboratory Based upon the physical maps shown in Figures 2 and 
20 3 we can try to estimate the physical distance between these two loci It takes approximately 14 Pi clones with an 

average insert size of approximately SO kb to span the region However, because all of these Pis overlap to some 
unknown degree, the physical region is most likely much smaller than 1 4 times sO kb Based on restriction maps of the 
clones covering the region, we estimate the size of the region containing BRCAl to be approximately 650 kb 

25 EXAMPLE 7 

Identification of Candidate cDNA Clones for the BRCAl Locus by Genomic Analysis of the Contig Region 

Complete screen of the plausible region 
30 

The first method to identify candidate cDNAs, although labor intensive, used known techniques The method com- 
prised the screening of cosmids and PI and BAC clones in the contig to identify putative coding sequences The clones 
containing putative coding sequences were then used as probes on filters of cDNA libraries to identify candidate cDNA 
clones for future analysis. The clones were screened for putative coding sequences by either of two methods 
35 

Zoo blots 

The first method for identifying putative coding sequences was by screening the cosmid and PI clones for sequences 
conserved through evolution across several species This technique is referred to as "zoo blot analysis and is described 
40 by Monaco. 1 986 Specifically, DNAs from cow. chicken, pig, mouse and rat were digested with the restriction enzymes 
EcoRI and Hindi! I (8 pg of DNA per enzyme). The digested DNAs were separated overnight on an 0 7% gel at 20 volts 
for 16 hours (14 cm gel), and the DNA transferred to Nylon membranes using standard Southern blot techniques. For 
example, the zoo blot filter was treated at 65°C in 0 1 x SSC, 0 5% SDS, and 0 2M Tris, pH 8 0, for 30 minutes and then 
blocked overnight at 42°C in 5x SSC, 10% PEG 8000, 20 mM NaPQ 4 pH 6 8. 100 pg/ml Salmon Sperm DNA, lx Den- 
45 hardt's, 50% formamide. 0 1% SDS. and 2 pg/mi C 0 t-1 DNA 

The cosmid and PI clones to be analyzed were digested with a restriction enzyme to release the human DNA from 
the vector DNA The DNA was separated on a 14 cm. 0 5% agarose gel run overnight at 20 volts for 16 hours. The 
human DNA bands were cut out of the gel and electroeluted from the gel wedge at 100 volts for at least two hours in 
0 5x Tris Acetate buffer (Mamatis efa/., 1982) The eluted Not I digested DNA (-15 kb to 25 kb) was then digested with 
50 EcoRI restriction enzyme to give smaller fragments (~0 5 kb to 5 0 kb) which melt apart more easily for the next step 
of labeling the DNA with radionucleotides The DNA fragments were labeled by means of the hexamer random prime 
labeling method (Boehringer-Mannheim, Cat #1004760) The labeled DNA was spermine precipitated (add lOOpITE, 
5 pi 0 1 M spermine, and 5 pl of 1 0 mg/ml salmon sperm DNA) to remove unincorporated radionucleotides The labeled 
DNA was then resuspended in 100 pl TE. 0.5 M NaCI at 65°C tor 5 minutes and then blocked with Human C 0 t-1 DNA 
55 for 2-4 hrs. as per the manufacturer's instructions (Gibco/BRL Cat #5279SA) The C 0 t-1 blocked probe was incubated 
on the zoo blot filters in the blocking solution overnight at 42°C The filters were washed for 30 minutes at room tem- 
perature in 2 x SSC. 0 1% SDS. and then in the same buffer for 30 minutes at 55°C The filters were then exposed 1 to 
3 days at -70°C to Kodak XAR-5 film with an intensifying screen Thus, the zoo blots were hybridized with either the 
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oooi of Eco-Ri fragments from the insert or each of the fragments individually 
HT- island analysis 

^ -'’he second method for identifying cosmids to use as probes on the cCNA libraries was HTF siand analysis S nee 

tne pulsed- f ieid map can reveal mTF islands cosmos that map to these HTF .siand regions were analyzed with priority 
HTF isianos are segments of DNA which contain a very hign frequency of unmethylated CpG dinucieotides (Tonoiio ct 
ai 1 990) and are revealed by the clustering of restriction sites of enzymes whose recognition sequences include CpG 
dinucleotides Enzymes known to bo useful in HTF-island analysis are Ascl. Notl. BssHII Eagl Sacll Nael. Marl. Smal 
and Mlul lAnand 1992) A puised-field map was created using the enzymes Notl Nrul Eagl Sacil and Sail and two 
HTF islands were found These islands are located in the distal end of the region, one being distal to the GP2B locus 
and the other being proximal to the same locus both outside the BRCAl region The cosmids derived from the vacs 
that cover these two locations were analyzed to identify those that contain these restriction sites, and thus the HTF 
islands. 

15 

cDNA screening 

Those clones that contain HTF islands or show hybridization to other species DNA besides human are likely to 
contain coding sequences The human DNA from these clones was isolated as whole insert or as EcoRI fragments and 
20 labeled as described above The labeled DNA was used to screen filters of various cDNA libraries under the same 
conditions as the zoo blots except that the cDNA filters undergo a more stringent wash of 0 1 x SSC, 0 1% SDS at 65°C 
for 30 minutes twice 

Most of the cDNA libraries used to date in our studies (libraries from normal breast tissue breast tissue from a 
woman in her eighth month of pregnancy and a breast malignancy) were prepared at Clonetech, Inc The cDNA library 
25 generated from breast tissue of an a month pregnant woman is available from Clonetech (Cat. #HL1037a) in the Lambda 
gt-10 vector and is grown tn C600Hfl bacterial host cells Normal breast tissue and malignant breast tissue samples 
were isolated from a 37 year old Caucasian female and one-gram of each tissue was sent to Clonetech for mRNA 
processing and cDNA library construction The latter two libraries were generated using both random and oligo-dT 
priming, with size selection of the final products which were then cloned into the Lambda Zap II vector and grown in 
oo XLI-blue strain of bacteria as described by the manufacturer Additional tissue-specific cDNA libraries include human 
fetal brain (Stratagene, Cat 936206). human testis (Clonetech Cat. HL3024) human thymus (Clonetech Cat HLll27n). 
human brain (Clonetech Cat. HL11310). human placenta (Clonetech Cat 1075b), and human skeletal muscle (Clonetech 
Cat. HL1 124b) 

The cDNA libraries were plated with their host cells on NZCYM plates, and filter lifts are made in duplicate from 
os each plate as per Mamatis et al (1982) Insert (human) DNA from the candidate genomic clones was purified and 
radioactively labeled to high specific activity The radioactive DNA was then hybridized to the cDNA filters to identify 
those cDNAs which correspond to genes located within the candidate cosmid clone cDNAs identified by this method 
were picked, replated, and screened again with the labeled clone insert or its derived EcoRI fragment DNA to verify their 
positive status Clones that were positive after this second round of screening were then grown up and their DNA purified 
* o for Southern blot analysis and sequencing Clones were either purified as plasmid through in vivo excision of the plasmid 

from the Lambda vector as described in the protocols from the manufacturers, or isolated from the Lambda vector as a 
restriction fragment and subcloned into plasmid vector 

The Southern blot analysis was performed in duplicate one using the original genomic insert DNA as a probe to 
verify that cDNA insert contains hybridizing sequences The second blot was hybridized with cDNA insert DNA from the 
45 largest cDNA clone to identify which clones represent the same gene All cDNAs which hybridize with the genomic clone 
and are unique were sequenced and the DNA analyzed to determine if the sequences represent known or unique genes 
All cDNA clones which appear to be unique were further analyzed as candidate BRCAl loci Specifically the clones are 
hybridized to Northern blots to look for breast specific expression and differential expression in normal versus breast 
tumor RNAs They are also analyzed by PCR on clones in the BRCAl region to verify their location To map the extent 
so of the locus, full length cDNAs are isolated and their sequences used as PCR probes on the YACs and the clones 
surrounding and including the original identifying clones Intron-exon boundaries are then further defined through se- 
quence analysis 

We have screened the normal breast. 8 month pregnant breast and fetal brain cDNA libraries with zoo blot -positive 
Eco R1 fragments from cosmid BAC and Pi clones in the region Potential BRCAl cDNA clones were identified among 
55 the three libraries. Clones were picked, replated, and screened again with the original probe to verify that they were 
positive 
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A^a vsis of hyond-selected cDNA 

cDNA fragments obtained f, cm direct selection were checked by Southern blot hybridization against the probe DNA 
*o verify that they originated from the contig Those that passed this test were sequenced m their entirety The set of 
5 qna sequences obtained in this way were then checked against each other to find independent clones that overlapped 
For example the clones 694-65 1240-1 and 1240*33 were obtained independently and subsequently shown to derive 
from the same contiguous cDNA sequence which nas been named EST 4S9 1 

Analysis of candidate clones 
w 

One or more of the candidate genes generated from above were sequenced and the information used for identifi- 
cation and classification of each expressed gene The DNA sequences were compared to known genes by nucleotide 
sequence comparisons and by translation in all frames followed by a comparison with known amino acid sequences 
This was accomplished using Genetic Data Environment (GDE) version 2 2 software and the Basic Local Alignment 
i5 Search Tool (Blast) series of client/server software packages (e g ; BLASTN 1.3.13MP), for sequence comparison 
against both local and remote sequence databases (e g . GenBank), running on Sun SPARC workstations Sequences 
reconstructed from collections of cDNA clones identified with the cosmids and Pis have been generated All candidate 
genes that represented new sequences were analyzed further to test their candidacy for the putative BRCA1 locus 

20 Mutation screening . 

To screen for mutations in the affected pedigrees, two different approaches were followed First, genomic DNA 
isolated from family members known to carry the susceptibility allele of BRCA1 was used as a template for amplification 
of candidate gene sequences by PCR If the PCR primers flank or overlap an intron/exon boundary, the amplified frag- 
25 ment will be larger than predicted from the cDNA sequence or will not be present in the amplified mixture. By a combi- 
nation of such amplification experiments and sequencing of PI . BAC or cosmid clones using the set of designed primers 
it is possible to establish the intron/exon structure and ultimately obtain the DNA sequences of genomic DNA from the 
pedigrees. 

A second approach that is much more rapid if the intron/exon structure of the candidate gene is complex involves 
oo sequencing fragments amplified from pedigree lymphocyte cDNA. cDNA synthesized from lymphocyte mRNA extracted 
from pedigree blood was used as a substrate for PCR amplification using the set of designed primers If the candidate 
gene is expressed to a significant extent in lymphocytes, such experiments usually produce amplified fragments that 
can be sequenced directly without knowledge of intron/exon junctions 

The products of such sequencing reactions were analyzed by gel electrophoresis to determine positions in the 
05 sequence that contain either mutations such as deletions or insertions, or base pair substitutions that cause ammo acid 
changes or other detrimental effects 

Any sequence within the BRCA1 region that is expressed in breast is considered to be a candidate gene for BRCA1 
Compelling evidence that a given candidate gene corresponds to BRCA1 comes from a demonstration that pedigree 
families contain defective alleles of the candidate 

40 

EXAMPLE 8 
Identification of BRCA1 
45 Identification of BRCA1 

Using several strategies, a detailed map of transcripts was developed for the 600 kb region of 17q21 between 
D17S1 321 and D17ST324 Candidate expressed sequences were defined as DNA sequences obtained from: 1) direct 
screening of breast, fetal brain, or lymphocyte cDNA libraries, 2) hybrid selection of breast, lymphocyte or ovary cDNAs, 
50 or 3) random sequencing of genomic DNA and prediction of coding exons by XPOUND (Thomas and Skolmck. 1 994). 
These expressed sequences in many cases were assembled into contigs composed of several independently identified 
sequences Candidate genes may comprise more than one of these candidate expressed sequences Sixty-five candi- 
date expressed sequences within this region were identified by hybrid selection by direct screening of cDNA libraries, 
and by random sequencing of PI subclones Expressed sequences were characterized by transcript size DNA se- 
55 quence, database comparison, expression pattern, genomic structure, and, most importantly DNA sequence analysis 
in individuals from kindreds segregating 17q-linked breast and ovarian cancer susceptibility 

Three independent contigs of expressed sequence. 1141.1 (649 bp), 694 5 (213 bp) and 754 2 (1079 bp) were 
isolated and eventually shown to represent portions of BRCA1 When ESTs for these contigs were used as hybridization 


42 


EP 0 705 902 A1 


probes for Northern anatys.s. a single transcript of approximately 7 8 Kb was ooserved ;n normal breast mRNA sug- 
gesting that tney encode different portions of a single gene Screens of breast fetal Dram thymus testes lymphocyte 
and placental cDNA libraries and PCR experiments with breast mRNA imKca the 1141 1 594 5 and 754 2 contigs 5’ 
RACE experiments with thymus testes and breast mRNA extended the contig to the putative 5' end yielding a composite 
5 full length sequence PCR and direct sequencing of Pis and BACs in the region were used to identify the location of 
mtrons and allowed the determination of splice donor and acceptor sites These three expressed sequences were 
merged into a single transcription unit that proved in the final analysis to be BRCA1 This transcription unit is located 
adjacent to Cl 7S855 in the center of the 600 kb region (Fig 4). 

Combination of sequences obtained from cDNA clones, hybrid selection sequences, and amplified PCR products 
io allowed construction of a composite full length BRCA1 cDNA (SEQ ID NO 1 ) The sequence of the BRCA1 cDNA (up 
through the stop codon) has also been deposited with GenBank and assigned accession number U-14630 This depos- 
ited sequence is incorporated herein by reference The cDNA clone extending farthest in the 3' direction contains a poly 
(A) tract preceded by a polyadenylation signal Conceptual translation of the cDNA revealed a single long open reading 
frame of 208 kilodaltons (ammo acid sequence: SEQ ID NO 2) with a potential initiation codon flanked by sequences 
is resembling the Kozak consensus sequence (Kozak, 1 987). Smith-Waterman (Smith and Waterman 1 961 ) and BLAST 
(Altschul eta I 1990) searches identified a sequence near the ammo terminus with considerable homology to zinc-finger 
domains (Fig 5) This sequence contains cysteine and histidine residues present in the consensus C3HC4 zinc-finger 
motif and shares multiple other residues with zinc-finger proteins in the databases The BRCA1 gene is composed of 
23 coding exons arrayed over more than 100 kb of genomic DNA (Fig 6) Northern blots using fragments of the BRCA1 
20 cDNA as probes identified a single transcript of about 7 8 kb, present most abundantly in breast, thymus and testis, and 

also present in ovary (Fig 7) Four alternatively spliced products were observed as independent cDNA clones. 3 of 
these were detected in breast and 2 in ovary mRNA (Fig 6) A PCR survey from tissue cDNAs further supports the idea 
that there is considerable heterogeneity near the 5' end of transcripts from this gene: the molecular basis for the heter- 
ogeneity involves differential choice of the first splice donor site and the changes detected all alter the transcript in the 
25 region 5' of the identified start codon We have detected six potential alternate splice donors in this 5' untranslated region, 
with the longest deletion being 1,155 bp The predominant form of the BRCA1 protein in breast and ovary lacks exon 
4 The nucleotide sequence for BRCA1 exon 4 is shown in SEQ ID NO:11, with the predicted ammo acid sequence 
shown in SEQ ID NO 12 

Additional 5' sequence of BRCA1 genomic DNA is set forth in SEQ ID NO: 13. The G at position 1 represents the 
30 potential start site in testis The A in position 140 represents the potential start site in somatic tissue There are six 

alternative splice forms of this 5' sequence as shown in Figure 8. The G at position 356 represents the canonical first 
splice donor site The G at position 444 represents the first splice donor site in two clones (testis 1 and testis 2). The G 
at position 889 represents the first splice donor site in thymus 3 A fourth splice donor site is the G at position 1 230 The 
T at position 1513 represents the splice acceptor site for all of the above splice donors A fifth alternate splice form has 
35 a first splice donor site at position 349 with a first acceptor site at position 591 and a second splice donor site at position 
889 and a second acceptor site at position 1513. A sixth alternate form is unspliced in this 5’ region The A at position 
1532 is the canonical start site, which appears at position 120 of SEQ ID NO 1 Partial genomic DNA sequences deter- 
mined for BRCA1 are set forth in Figures 10A-10H and SEQ ID Numbers:14-34 The lower case letters (in figures 
10A-10H) denote intron sequence while the uppercase letters denote exon sequence. Indefinite intervals within mtrons 
40 are designated with vvvvwvvvvvvv in Figures 1 0A-1 OH. The intron/exon junctions are shown in Table 9. The CAG found 

at the 5' end of exons 3 and 1 4 is found in some cDNAs but not in others. Known polymorphic sites are shown in Figures 
10A-10H in boldface type and are underlined 
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Low stringency blots in which genomic DNA from organisms of diverse phylogenetic background were probed with 
BRCA1 sequences that lack the zinc-finger region revealed strongly hybridizing fragments in human, monkey, sheep 
and pig and very weak hybridization signals in rodents This result indicates that, apart from the zinc -finger domain, 
BRCA1 is conserved only at a moderate level through evolution 

Germlme BRCA1 mutations in 17q-linked kindreds 


The most rigorous test for BRCA1 candidate genes is to search for potentially disruptive mutations in carrier indi- 
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viduais from kindreds that segregate 1 7q-hnKed susceptib lity to breast and ovarian cancer Such individuals must conta.n 
3RCA1 aileles that differ from the wildtype sequence The set of DNA samples used m this analysis consisted of SNA 
from individuals representing 8 different 3RCA1 kindreds ( iabie 10) 

TABLE IQ 

KINDRED DESCRIPTIONS AND ASSOCIATED LOD SCORES 


Sporadic 


Kindred 

Cases fn) 


Cases In) 

LOD 

Markerfs) 


Br 

Br<50 Ov 

Score 


2082 

31 

20 

22 

7 

9.49 

D17S1327 

2099 

22 

14 

2 * 

0 

2.36 

D17S800/D17S855 2 

2035 

10 

8 

1* 

0 

2.25 

D17S1327 

1901 

10 

7 

1* 

0 

1.50 

D17S855 

1925 

4 

3 

0 

0 

0.55 

D17S579 

1910 

5 

4 

0 

0 

0.36 

D17S579/D17S250 2 

1927 

5 

4 

0 

1 

-0.44 

D17S250 

1911 

8 

5 

0 

2 

-0.20 

D17S250 


1 Number of women with breast cancer (diagnosed under age 50) or ovarian cancer (diagnosed at any 
age) who do not share the BRCA1 -linked haplotype segregating in the remainder of the cases in the 
kindred. 

30 

2 Multipoint LOD score calculated using both markers 

* kindred contains one individual who had both breast and ovarian cancer; this individual is counted 
35 as a breast cancer case and as an ovarian cancer case. 

The logarithm of the odds (LOD) scores in these kindreds range from 9 49 to -0 44 for a set of markers in 17q21 
Four of the families have convincing LOD scores for linkage, and 4 have low positive or negative LOD scores The latter 
kindreds were included because they demonstrate haplotype sharing at chromosome 1 7q2t for at least 3 affected mem- 
bers Furthermore, all kindreds in the set display early age of breast cancer onset and 4 of the kindreds include at least 
40 one case of ovarian cancer both hallmarks of BRCA1 kindreds. One kindred, 2082, has nearly equal incidence of breast 
and ovarian cancer, an unusual occurrence given the relative rarity of ovarian cancer in the population. All of the kindreds 
except two were ascertained in Utah. K2035 is from the midwest. K2099 is an African-American kindred from the southern 
USA 

In the initial screen for predisposing mutations in BRCA1 , DNA from one individual who carries the predisposing 
45 haplotype in each kindred was tested The 23 coding exons and associated splice junctions were amplified either from 
genomic DNA samples or from cDNA prepared from lymphocyte mRNA. When the amplified DNA sequences were 
compared to the wildtype sequence, 4 of the 8 kindred samples were found to contain sequence variants (Table 11) 
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TABLE 1 1 


PREDISPOSING MUTATIONS 

5 


10 


Kindred Number 

Mutation 

Coding Effect 

Location' 

2082 

C-*T 

Gin— >Stop 

4056 

1910 

extra C 

frameshift 

5385 

2099 

T->G 

Met— > Arg 

5443 

2035 

7 

loss of transcript 


1901 

1 1 bp deletion 

frameshift 

189 


* In Sequence ID NO: 1 


All four sequence variants are heterozygous and each appears in only one of the kindreds Kindred 2062 contains 
20 a nonsense mutation in exon 11 (Fig. 9A), Kindred 1 910 contains a single nucleotide insertion in exon 20 (Fig 9B), and 
Kindred 2099 contains a mlssense mutation in exon 21 , resulting in a Met->Arg substitution The frameshift and nonsense 
mutations are likely disruptive to the function of the BRCA1 product The peptide encoded by the frameshift allele in 
Kindred 1910 would contain an altered ammo acid sequence beginning 108 residues from the wildtype C-terminus The 
peptide encoded by the frameshift allele in Kindred 1 901 would contain an altered ammo acid sequence beginning with 
25 the 24th residue from the wildtype N-terminus. The mutant allele in Kindred 2082 would encode a protein missing 551 
residues from the C-terminus. The missense substitution observed in Kindred 2099 is potentially disruptive as it causes 
the replacement of a small hydrophobic amino acid (Met), by a large charged residue (Arg). Eleven common polymor- 
phisms were also identified. 8 in coding sequence and 3 in introns 

The individual studied in Kindred 2035 evidently contains a regulatory mutation in BRCA1 In her cDNA a polymor- 
30 phic site (A-»G at base 3667) appeared homozygous, whereas her genomic DNA revealed heterozygosity at this position 
(Fig 9C) A possible explanation for this observation is that mRNA from her mutated BRCA1 allele is absent due to a 
mutation that affects its production or stability. This possibility was explored further by examining 5 polymorphic sites in 
the BRCA11 coding region, which are separated by as much as 3 5 kb in the BRCA1 transcript. In all cases where her 
genomic DNA appeared heterozygous for a polymorphism. cDNA appeared homozygous. In individuals from other kin- 
35 dreds and in non-haplotype carriers in Kindred 2035. these polymorphic sites could be observed as heterozygous in 
cDNA implying that amplification from cDNA was not biased in favor of one allele This analysis indicates that a BRCA1 
mutation in Kindred 2035 either prevents transcription or causes instability or aberrant splicing of the BRCA1 transcript 


Coseareaation of BRCA1 mutations with BRCA1 haplotypes and population frequency analysis. 

40 

In addition to potentially disrupting protein function, two criteria must be met for a sequence variant to qualify as a 
candidate predisposing mutation. The variant must: 1) be present in individuals from the kindred who carry the predis- 
posing BRCA1 haplotype and absent in other members of the kindred, and 2) be rare in the general population 

Each mutation was tested for cosegregation with BRCA1 For the frameshift mutation in Kindred 1910. two other 
45 haplotype carriers and one non-carrier were sequenced (Fig. 9B) Only the carriers exhibited the frameshift mutation 
The C to T change in Kindred 2082 created a new Avrll restriction site Other carriers and non-carriers in the kindred 
were tested for the presence of the restriction site (Fig 9A) An allele-specific oligonucleotide (ASO) was designed to 
detect the presence of the sequence variant in Kindred 2099 Several individuals from the kindred, some known to carry 
the haplotype associated with the predisposing allele, and others known not to carry the associated haplotype. were 
so screened by ASO for the mutation previously detected in the kindred In each kindred, the corresponding mutant allele 
was detected in individuals carrying the BRCA1 -associated haplotype, and was not detected in noncarriers In the case 
of the potential regulatory mutation observed in the individual from Kindred 2035, cDNA and genomic DNA from carriers 
in the kindred were compared for heterozygosity at polymorphic sites In every instance, the extinguished allele in the 
cDNA sample was shown to lie on the chromosome that carries the BRCA1 predisposing allele (Fig 9C) 

55 To exclude the possibility that the mutations were simply common polymorphisms in the population ASOs for each 

mutation were used to screen a set of normal DNA samples Gene frequency estimates in Caucasians were based on 
random samples from the Utah population Gene frequency estimates in African-Americans were based on 39 samples 
provided by M, Peracek-Vance which originate from African-Americans used in her linkage studies and 20 newborn 


47 


EP 0 705 902 A1 


Jtan Afncan-Amcricans None of the 4 potential predisposing mutations was found in the appropnate control copulation 
rdicatmg that they are rare in the general population Thus two important reauirements for BRCA1 susceptibility alleies 
were fulfilled by the candidate predisposing mutations 1 1 cosegregation of the mutant aileie with disease and 2) absence 
of the mutant allele in controls indicating a low gene frequency in the general population 
5 

Phenotypic Expression of BRCAi Mutations 

"he effect of the mutations on the BRCAl protein correlated with differences in the observed phenotypic expression 
m the BRCAl kindreds Most BRCAl kindreds have a moderately increased ovarian cancer risk and a smaller subset 
'0 have high risks of ovarian cancer comparable to those for breast cancer (Easton el al. . 1 993). Three of the four kindreds 
in which BRCAl mutations were detected fall into the former category while the fourth (K20S2) falls into the high ovarian 
cancer risk group Since the BRCAl nonsense mutation found in K2032 lies closer to the ammo terminus than the other 
mutations detected, it might be expected to have a different phenotype In fact, Kindred K2082 mutation has a high 
incidence of ovarian cancer and a later mean age at diagnosis of breast cancer cases than the other kindreds (Goldgar 
'5 et al 1 994) This difference in age of onset could be due to an ascertainment bias in the smaller more highly penetrant 

families, or it could reflect tissue-specific differences in the behavior of BRCAl mutations The other 3 kindreds that 
segregate known BRCAl mutations have on average, one ovarian cancer for every 1 0 cases of breast cancer but have 
a high proportion of breast cancer cases diagnosed in their late 20's or early 30's. Kindred 1 91 0 which has a frameshift 
mutation, is noteworthy because three of the four affected individuals had bilateral breast cancer, and in each case the 
20 second tumor was diagnosed within a year of the first occurrence Kindred 2035 which segregates a potential regulator/ 

BRCAl mutation might also be expected to have a dramatic phenotype Eighty percent of breast cancer cases in this 
kindred occur underage 50 This figure is as high as any in the set. suggesting a BRCAl mutant allele of high penetrance 
(Table 10) 

Although the mutations described above clearly are deleterious, causing breast cancer in women at very young 
25 ages, each of the four kindreds with mutations includes at least one woman who carries the mutation who lived until 
age 80 without developing a malignancy. It will be of utmost importance in the studies that follow to identify other genetic 
or environmental factors that may ameliorate the effects of BRCAl mutations 

In four of the eight putative BRCAl -linked kindreds, potential predisposing mutations were not found. Three of these 
four have LOD scores for BRCAl -linked markers of less than 0 55. Thus, these kindreds may not in reality segregate 
30 BRCAl predisposing alleles Alternatively the mutations in these four kindreds may lie in regions of BRCAl that, for 
example, affect the level of transcript and therefore have thus far escaped detection 

Role of BRCAl in Cancer . 

35 Most tumor suppressor genes identified to date give rise to protein products that are absent, nonfunctional, or 

reduced in function The majority of TP53 mutations are missense some of these have been shown to produce abnormal 
p53 molecules that interfere with the function of the wildtype product (Shaulian et al, 1992: Srivastava eta!, 1993) A 
similar dominant negative mechanism of action has been proposed for some adenomatous polyposis coll (APC) alleles 
that produce truncated molecules (Su et al , 1993), and for point mutations in the Wilms' tumor gene (WT1) that alter 
40 DNA binding of the protein (Little et ai. 1993). The nature of the mutations observed in the BRCAl coding sequence 
is consistent with production of either dominant negative proteins or nonfunctional proteins The regulatory mutation 
inferred in Kindred 2035 cannot be a dominant negative, rather, this mutation likely causes reduction or complete loss 
of BRCAl expression from the affected allele 

The BRCAl protein contains a C 3 HC 4 zinc-finger domain, similar to those found in numerous DNA binding proteins 
45 and implicated in zinc-dependent binding to nucleic acids The first 1 80 ammo acids of BRCAl contain five more basic 
residues than acidic residues In contrast, the remainder of the molecule is very acidic, with a net excess of 70 acidic 
residues The excess negative charge is particularly concentrated near the C-terminus Thus, one possibility is that 
BRCAl encodes a transcription factor with an N-terminal DNA binding domain and a C-terminal transactivational "acidic 
blob" domain Interestingly, another familial tumor suppressor gene, WT1. also contains a zinc -finger motif (Haber et 
50 a/. 1990) Many cancer predisposing mutations m WT1 after zinc-finger domains (Little et al , 1993: Haber et al , 1990; 

Little et al 1 992). WT 1 encodes a transcription factor and alternative splicing of exons that encode parts of the zinc-fin- 
ger domain alter the DNA binding properties of WT 1 (Bickmore et ai , 1 992) Some alternatively spliced forms of WT 1 
mRNA generate molecules that act as transcriptional repressors (Drummond et ai , 1994) Some BRCAl splicing var- 
iants may alter the zinc-finger motif, raising the possibility that a regulatory mechanism similar to that which occurs in 
55 WT 1 may apply to BRCAl 
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£XAM p LE 9 


Analysis of Tumors for BRCA1 Mutations 

5 focus the analysis on tumors most likely to contain BRCAl mutations primary breast and ovarian carcinomas 

we^e typed for LOH in the BRCAl region Three highly polymorphic simple tandem repeat markers were used to assess 
LC'H Dl7Sl323and D17S855 which are intragenic to BRCAl and D17S1327 which lies approximately 100 kb distal 
to BRCA1 The combined LOH frequency in informative cases (i e where the germline was heterozygous) was 32/72 
(44%) for the breast carcinomas and 12/21 (57%) for the ovarian carcinomas consistent with previous measurements 
'0 of LOH in the region (Futreal et al 1992b Jacobs et al 1993 Sato et al 1990 Eccles et al . 1990 Cropp et al.. 
1 994} The analysis thus defined a panel of 32 breast tumors and 12 ovarian tumors of mixed race and age of onset to 
be examined for BRCA mutations The complete 5 589 bp coding region and mtron/exon boundary sequences of the 
gene were screened in this tumor set by direct sequencing alone or by a combination of single-strand conformation 
analysis (SSCA) and direct sequencing 

'5 A total of six mutations (of which two are identical) was found, one in an ovarian tumor, four in breast tumors and 

one in a male unaffected haplotype carrier (Table 12). One mutation, Glul541Ter. introduced a stop codon that would 
create a truncated protein missing 323 ammo acids at the carboxy terminus In addition, two missense mutations were 
identified. These are Ala1708Glu and Met1775Arg and involve substitutions of small, hydrophobic residues by charged 
residues Patients 1 7764 and 1 9964 are from the same family In patient OV24 nucleotide 2575 is deleted and in patients 
20 1 7764 and 1 9964 nucleotides 2993-2996 are deleted 


TABLE 12 

Predisposing Mutations 

25 



Patient 

Codon 

Nucleotide 

Change 

Amino Acid 
Change 

Age of 
Onset 

Family 

History 

30 

BT098 

1541 

GAG->IAG 

Glu ->• Stop 

39 

• 


OV24 

819 

1 bp deletion 

frameshift 

44 

_ 


BT106 

1708 

G£G — » GAG 

Ala Glu 

24 

+ 


MC44 

1775 

AIG -► AQG 

Met -» Arg 

42 

+ 

35 

17764 

958 

4 bp deletion 

frameshift 

31 

+ 

40 

19964 958 4 bp deletion 

Unaffected haplotype carrier, male 

frameshift 


+* 


Several lines of evidence suggest that all five mutations represent BRCA1 susceptibility alleles 


(i) all mutations are present in the germline 

45 (ii) all are absent in appropriate control populations, suggesting they are not common polymorphisms; 

(iii) each mutant allele is retained in the tumor, as is the case in tumors from patients belonging to kindreds that 
segregate BRCA1 susceptibility alleles (Smith etal., 1992: Kelsell etal ., 1993) (if the mutations represented neutral 
polymorphisms, they should be retained in only 50% of the cases); 

50 

(iv) the age of onset in the four breast cancer cases with mutations varied between 24 and 42 years of age, consistent 
with the early age of onset of breast cancer in individuals with BRCA1 susceptibility similarly, the ovarian cancer 
case was diagnosed at 44 an age that falls in the youngest 13% of ail ovarian cancer cases, and finally. 

55 (v) three of the five cases have positive family histones of breast or ovarian cancer found retrospectively in their 

medical records, although the tumor set was not selected with regard to this criterion 

BT106 was diagnosed at age 24 with breast cancer. Her mother had ovarian cancer, her father had melanoma, and 
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^e r paternal grancmother also had breast cancer Patient MC44 an African-American naa biiaterai breast cancer at 
age 42 This patient had a sister who died of breast cancer at age 34 another sister who died of lymphoma and a 
b r other who died of lung cancer. Her mutation 'Metl775Arg) had been detected previously in Kindred 2099. an Am- 
can- American family that segregates a BRCA1 susceptibility aileie and was absent m African-American and Caucas<an 
s controls Patient MC44 to cur Knowledge is unrelated to Kindred 2099 The detection of a rare mutant allele once n 
a 3RCA1 kindred and once in the germlme of an apparently unrelated early-onset breast cancer case suggests that 
the Viet f 775Arg change may be a common predisposing mutation in African-Americans Collectively these observations 
indicate that all four BRCA1 mutations in tumors represent susceptibility alleles, no somatic mutations were detected m 
the samples analyzed 

io The paucity of somatic BRCA1 mutations is unexpected ; given the frequency of LOH on 1 7q, and the usual role of 

susceptibility genes as tumor suppressors in cancer progression There are three possible explanations for this result 
(i) some BRCA1 mutations in coding sequences were missed by our screening procedure, (u) BRCA1 somatic mutations 
fall primarily outside the coding exons and (in) LOH events in I7q do not reflect BRCA1 somatic mutations 

If somatic BRCA1 mutations truly are rare in breast and ovary carcinomas, this would have strong implications for 
is the biology of BRCA1 The apparent lack of somatic BRCA1 mutations implies that there may be some fundamental 

difference in the genesis of tumors in genetically predisposed BRCA1 carriers, compared with tumors in the genera! 
population For example, mutations in BRCA1 may have an effect only on tumor formation at a specific stage early in 
breast and ovarian development This possibility is consistent with a primary function for BRCA1 in premenopausal 
breast cancer Such a model for the role of BRCA1 in breast and ovarian cancer predicts an interaction between repro- 
ve ductive hormones and BRCA1 function However no clinical or pathological differences in familial versus sporadic breast 
and ovary tumors other than age of onset, have been described (Lynch et al 1990) On the other hand, the recent 
finding of increased TP53 mutation and microsatellite instability in breast tumors from patients with a family history of 
breast cancer (Glebov et al , 1 994) may reflect some difference in tumors that arise in genetically predisposed persons 
The involvement of BRCA1 in this phenomenon can now be addressed directly Alternatively, the lack of somatic BRCA1 
25 mutations may result from the existence of multiple genes that function in the same pathway of tumor suppression as 

BRCA1, but which collectively represent a more favored target for mutation in sporadic tumors Since mutation of a 
single element in a genetic pathway is generally sufficient to disrupt the pathway, BRCA1 might mutate at a rate that is 
far lower than the sum of the mutational rates of the other elements 

30 EXAMPLE 10 

Analysis of the BRCA1 Gene 

The structure and function of BRCA 1 gene are determined according to the following methods 
35 

Biological Studies . 

Mammalian expression vectors containing BRCA1 cDNA are constructed and transfected into appropriate breast 
carcinoma cells with lesions in the gene. Wild-type BRCA1 cDNA as well as altered BRCA1 cDNA are utilized. The 
40 altered BRCA1 cDNA can be obtained from altered BRC Al alleles or produced as described below. Phenotypic reversion 
in cultures (e g., cell morphology, doubling time, anchorage-independent growth) and in animals (e g., tumorigemcity) 
is examined The studies will employ both wild-type and mutant forms (Section B) of the gene 

Molecular Genetics Studies . 

45 

In vitro mutagenesis ts performed to construct deletion mutants and missense mutants (by single base-pair substi- 
tutions in individual codons and cluster charge alanine scanning mutagenesis) The mutants are used in biological, 
biochemical and biophysical studies. 

50 Mechanism Studies 

The ability of BRC Al protein to bind to known and unknown DNA sequences is examined Its ability to transactivate 
promoters is analyzed by transient reporter expression systems in mammalian celis Conventional procedures such as 
particle-capture and yeast two-hybrid system are used to discover and identify any functional partners The nature and 
55 functions of the partners are characterized These partners in turn are targets for drug discovery. 
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Slructu r al Studies 


Recombinant proteins are produced in E coii yeast, insect and/or mammalian ceils and are used m crystallograpn- 
cai and NMR studies Molecular modeling of the proteins is also employed These studies facilitate structure-driven 
5 drug design 

EXAMPLE 11 

Two Step Assay to Detect the Presence of BRCA1 m a Sample 

TO 

Patient sample is processed according to the method disclosed by Antonarakis et al (1 555} separated through a 
1% agarose gel and transferred to nylon membrane for Southern blot analysis. Membranes are UV cross linked at 150 
mJ using a GS Gene Linker (Bio-Rad) BRCA1 probe corresponding to nucleotide positions 3631 -3930 of SEQ ID NO 1 
is subcioned into pTZIBU The phagemids are transformed into E. cofi MV1190 infected with M13K07 helper phage 
is (Bio-Rad, Richmond. CA) Single stranded DNA is isolated according to standard procedures (see Sambrook et a/.. 
1989). 

Blots are prehybndized for 15-30 min at 65°C in 7% sodium dodecyl sulfate (SDS) in 0 5 M NaP0 4 The methods 
follow those described by Nguyen et al. 1992. The blots are hybridized overnight at 65°C in 7% SDS, 0 5 M NaP0 4 
with 25-50 ng/ml single stranded probe DNA Post-hybridization washes consist of two 30 mm washes in 5% SDS. 40 
20 mM NaP0 4 at 65°C, followed by two 30 mm washes in 1% SDS, 40 mM NaPQ 4 at 65 3 C 

Next the blots are rinsed with phosphate buffered saline (pH 6 0) for 5 mm at room temperature and incubated with 
0 2% casein in PBS for 30-60 mm at room temperature and rinsed in PBS for 5 mm The blots are then preincubated 
for 5-10 minutes in a shaking water bath at 45 9 C with hybridization buffer consisting of 6 M urea. 0 3 M NaCI. and 5X 
Denhardt's solution (see Sambrook, et al., 1969) The buffer is removed and replaced with 50-75 pi/cm 2 fresh hybridi- 
25 zation buffer plus 2 5 nM of the covalently cross-linked oligonucleotide-alkaline phosphatase conjugate with the nucle- 
otide sequence complementary to the universal primer site (UP-AP, Bio-Rad) The blots are hybridized for 20-30 mm at 
45° C and post hybridization washes are incubated at 45°C as two 1 0 mm washes in 6 M urea : 1 x standard saline citrate 
(SSC), 0.1% SDS and one 10 mm wash in lx SSC 0 1% Triton®X-100 The blots are rinsed for 10 mm at room tem- 
perature with lx SSC. 

30 Blots are incubated for 10 min at room temperature with shaking in the substrate buffer consisting of 0 1 M dieth- 

anolamine, 1 mM MgCI 2 , 0.02% sodium azide, pH 10 0 Individual blots are placed in heat sealable bags with substrate 
buffer and 0 2 mM AMPPD (3-(2 , -spiroadamantane)-4-methoxy4-(3 , -phosphoryloxy)phenyl-1 2-dioxetane disodium 
salt, Bio-Rad) After a 20 min incubation at room temperature with shaking, the excess AMPPD solution is removed 
The blot is exposed to X-ray film overnight Positive bands indicate the presence of BRCA1 
35 

EXAMPLE 12 

Generation of Polyclonal Antibody against BRCA1 

^0 Segments of BRCA1 coding sequence were expressed as fusion protein in E. coli. The overexpressed protein was 

purified by gel elution and used to immunize rabbits and mice using a procedure similar to the one described by Harlow 
and Lane, 1988. This procedure has been shown to generate Abs against various other proteins (for example see 
Kraemer et al.. 1 993). 

Briefly, a stretch of BRCA1 coding sequence was cloned as a fusion protein in plasmid PET5A (Novagen. Inc,, 
45 Madison, Wl). The BRCA1 incorporated sequence includes the ammo acids corresponding to #1361-1554 of SEQ ID 
NO 2 After induction with IPTG, the overexpression of a fusion protein with the expected molecular weight was verified 
by SDS/PAGE Fusion protein was purified from the gel by electroelution The identification of the protein as the BRCA1 
fusion product was verified by protein sequencing at the N-temninus. Next, the purified protein was used as immunogen 
in rabbits Rabbits were immunized with 100 pg of the protein in complete Freund's adjuvant and boosted twice in 3 
so week, intervals, first with 100 pg of immunogen in incomplete Freund's adjuvant followed by 100 pg of immunogen in 
PBS Antibody containing serum is collected two weeks thereafter 

This procedure is repeated to generate antibodies against the mutant forms of the BRCA1 gene These antibodies 
in conjunction with antibodies to wild type BRCA1 , are used to detect the presence and the relative level of the mutant 
forms in various tissues and biological fluids 
55 


51 


EP 0 705 902 A1 


EXAMPLE 13 

Generation of Monoclonal Antibodies Specific for 3RCA1 

5 Monoclonal antibodies are generated according to the f oitowmg protocol Mice are immunized with immunogen 

comprising intact BRCA1 or BRCA1 peptides (wild type or mutant) conjugated to Keyhole limpet hemocyanm using 
giutaraldehyde or EDC as is well known 

The immunogen is mixed with an adjuvant Each mouse receives four injections oMO to 100 ug of immunogen and 
after the fourth injection blood samples are taken from the mice to determine if the serum contains antibody to the 
to immunogen Serum titer is determined by ELISA or RIA Mice with sera indicating the presence of antibody to the im- 
munogen are selected for hybridoma production 

Spleens are removed from immune mice and a single cell suspension is prepared (see Harlow and Lane 1988). 
Cell fusions are performed essentially as described by Kohler and Milstein, 1 975 Briefly P3 65 3 myeloma cells (Amer- 
ican Type Culture Collection Rockviiie. MD) are fused with immune spleen cells using polyethylene glycol as described 
15 by Harlow and Lane 1988 Cells are plated at a density of 2x1 0 5 cells/well in 96 well tissue culture plates Individual 
wells are examined for growth and the supernatants of wells with growth are tested for the presence of BRCA1 specific 
antibodies by ELISA or RIA using wild type or mutant BRCA1 target protein Cells in positive wells are expanded and 
subcloned to establish and confirm monoclonality. 

Clones with the desired specificities are expanded and grown as ascites in mice or in a hollow fiber system to 
20 produce sufficient quantities of antibody for characterization and assay development 

EXAMPLE 14 

Sandwich Assay for BRCA1 
25 

Monoclonal antibody is attached to a solid surface such as a plate, tube bead or particle Preferably, the antibody 
is attached to the well surface of a 96-well ELISA plate. 100 pi sample {e.g., serum, urine, tissue cytosol) containing the 
BRCA1 peptide/protein (wild-type or mutant) is added to the solid phase antibody The sample is incubated for 2 hrs at 
room temperature. Next the sample fluid is decanted, and the solid phase is washed with buffer to remove unbound 
oo material. 100 pi of a second monoclonal antibody (to a different determinant on the BRCA1 peptide/protein) is added 
to the solid phase. This antibody is labeled with a detector molecule (e g., 125 l, enzyme, fluorophore, or a chromophore) 
and the solid phase with the second antibody is incubated for two hrs at room temperature The second antibody is 
decanted and the solid phase is washed with buffer to remove unbound material 

The amount of bound label, which is proportional to the amount of BRCA1 peptide/protein present in the sample 
os is quantitated Separate assays are performed using monoclonal antibodies which are specific for the wild-type BRCA1 
as well as monoclonal antibodies specific for each of the mutations identified in BRCA1 

Industrial Utility 

40 As previously described above, the present invention provides materials and methods for use in testing BRCA1 

alleles of an individual and an interpretation of the normal or predisposing nature of the alleles Individuals at higher 
than normal risk might modify their lifestyles appropriately In the case of BRCA1, the most significant non-genetic risk 
factor is the protective effect of an early, full term pregnancy. Therefore, women at risk could consider early childbearing 
or a therapy designed to simulate the hormonal effects of an early full-term pregnancy Women at high risk would also 
45 strive for early detection and would be more highly motivated to learn and practice breast self examination Such women 
would also be highly motivated to have regular mammograms, perhaps starting at an earlier age than the general pop- 
ulation Ovarian screening could also be undertaken at greater frequency. Diagnostic methods based on sequence 
analysis of the BRCA1 locus could also be applied to tumor detection and classification Sequence analysis could be 
used to diagnose precursor lesions With the evolution of the method and the accumulation of information about BRCA1 
oo and other causative loci, it could become possible to separate cancers into benign and malignant 

Women with breast cancers may follow different surgical procedures if they are predisposed, and therefore likely 
to have additional cancers, than if they are not predisposed Other therapies may be developed, using either peptides 
or small molecules (rational drug design) Peptides could be the missing gene product itself or a portion of the missing 
gene product Alternatively, the therapeutic agent could be another molecule that mimics the deleterious gene's function, 
55 either a peptide or a nonpeptidic molecule that seeks to counteract the deleterious effect of the inherited locus The 
therapy could also be gene based, through introduction of a normal BRCA1 allele into individuals to make a protein 
which will counteract the effect of the deleterious allele These gene therapies may take many forms and may be directed 
either toward preventing the tumor from forming curing a cancer once it has occurred, or stopping a cancer from me- 
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tastasizing 

It win oe appreciated that the methods and compositions of the instant invention can be incoroorated in the form of 
a variety of embodiments, only a few of which are disclosed herein It will be apparent to tne artisan that other emood- 
iments ex st and do not depart from the spirit of the invention Thus the described embodiments are illustrative and 
5 should not be construed as restrictive 
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SEQUENCE LISTING 


5 (1) GENERAL INFORMATION 


10 


15 


20 


25 


30 


35 


(i) APPLICANT: 

(A) NAME: MYRIAD GENETICS INC 

(B) STREET: 300 WAKARA WAY 

(C) CITY: SALT LAKE CITY 

(D) STATE: UTAH 

(E) COUNTRY: UNITED STATES OF AMERICA 

(F) POSTAL CODE (ZIP): 84108 


(A) NAME: THE UNIVERSITY OF UTAH RESEARCH FOUNDATION 

(B) STREET: 421 WAKARA WAY, SUITE 170 

(C) CITY: SALT LAKE CITY 

(D) STATE: UTAH 


(E) COUNTRY: UNITED STATES OF AMERICA 

(F) POSTAL CODE (ZIP): 84108 


(A) NAME: THE UNITED STATES OF AMERICA, AS REPRESENTED BY THE 

SECRETARY OF THE DEPARTMENT OF HEALTH AND HUMAN 
SERVICES 

(B) STREET: OFFICE OF TECHNOLOGY TRANSFER, 6011 EXECUTIVE 

BOULEVARD, SUITE 325 

(C) CITY: ROCKVILLE 

(D) STATE: MARYLAND 

(E) COUNTRY: UNITED STATES OF AMERICA 

(F) POSTAL CODE (ZIP): 20852 


(ii) TITLE OF INVENTION: 17q- LINKED BREAST AND OVARIAN CANCER 
SUSCEPTIBILITY GENE 


(iii) NUMBER OF SEQUENCES: 85 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

45 (C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 


50 


55 


57 
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w 


15 


20 


(2) INFORMATION FOR SEC ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5914 base pairs 

(B) TYPE : nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(ii i) HYPOTHETICAL: NO 

liv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 120.. 5711 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAA 

ATG GAT TTA TCT GCT CTT CGC GTT GAA GAA GTA CAA AAT GTC ATT AAT 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 

15 10 15 

GCT ATG CAG AAA ATC TTA GAG TGT CCC ATC TGT CTG GAG TTG ATC AAG 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 


40 


45 


50 


55 


60 

119 

167 

215 


58 
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GAA 

CCT 

GTC 

TCC 

ACA 

AAG 

TGT 

GAC 

CAC 

ATA 

TTT 

TGC 

AAA 

TTT 

TGC 

ATG 

263 


Glu 

Pro 

Val 

Ser 

Thr 

Lys 

Cys 

Asp 

His 

lie 

Phe 

Cys 

Lys 

Phe 

Cys 

Met 





35 





40 





45 





5 

CTG 

AAA 

CTT 

CTC 

AAC 

CAG 

AAG 

AAA 

GGG 

CCT 

TCA 

CAG 

TGT 

CCT 

TTA 

TGT 

311 


Leu 

Lys 

Leu 

Leu 

Asn 

Gin 

Lys 

Lys 

Gly Pro 

Ser 

Gin 

Cys 

Pro 

Leu 

Cys 




50 





55 





60 






70 

AAG 

AAT 

GAT 

ATA 

ACC 

AAA 

AGG 

AGC 

CTA 

CAA 

GAA 

AGT 

ACG 

AGA 

TTT 

AGT 

359 

Lys 

Asn 

Asp 

He 

Thr 

Lys 

Arg 

Ser 

Leu 

Gin 

Glu 

Ser 

Thr 

Arg 

Phe 

Ser 



65 





70 





75 





80 



CAA 

CTT 

GTT 

GAA 

GAG 

CTA 

TTG 

AAA 

ATC 

ATT 

TGT 

GCT 

TTT 

CAG 

CTT 

GAC 

407 

75 

Gin 

Leu 

Val 

Glu 

Glu 

Leu 

Leu 

Lys 

lie 

lie 

Cys 

Ala 

Phe 

Gin 

Leu 

Asp 






85 





90 





95 




ACA 

GGT 

TTG 

GAG 

TAT 

GCA 

AAC 

AGC 

TAT 

AAT 

TTT 

GCA 

AAA 

AAG 

GAA 

AAT 

455 


Thr 

Gly 

Leu 

Glu 

Tyr 

Ala 

Asn 

Ser 

Tyr 

Asn 

Phe 

Ala 

Lys 

Lys 

Glu 

Asn 


20 




100 





105 





110 





AAC 

TCT 

CCT 

GAA 

CAT 

CTA 

AAA 

GAT 

GAA 

GTT 

TCT 

ATC 

ATC 

CAA 

AGT 

ATG 

503 


Asn 

Ser 

Pro 

Glu 

His 

Leu 

Lys 

Asp 

Glu 

Val 

Ser 

lie 

lie 

Gin 

Ser 

Met 





115 





120 





125 





25 

GGC 

TAC 

AGA 

AAC 

CGT 

GCC 

AAA 

AGA 

CTT 

CTA 

CAG 

AGT 

GAA 

CCC 

GAA 

AAT 

551 


Gly 

Tyr 

Arg 

Asn 

Arg 

Ala 

Lys 

Arg 

Leu 

Leu 

Gin 

Ser 

Glu 

Pro 

Glu 

Asn 




130 





135 





14 0 






30 

CCT 

TCC 

TTG 

CAG 

GAA 

ACC 

AGT 

CTC 

AGT 

GTC 

CAA 

CTC 

TCT 

AAC 

CTT 

GGA 

599 

Pro 

Ser 

Leu 

Gin 

Glu 

Thr 

Ser 

Leu 

Ser 

Val 

Gin 

Leu 

Ser 

Asn 

Leu 

Gly 



145 





150 





155 





160 



ACT 

GTG 

AGA 

ACT 

CTG 

AGG 

ACA 

AAG 

CAG 

CGG 

ATA 

CAA 

CCT 

CAA 

AAG 

ACG 

647 

35 

Thr 

Val 

Arg 

Thr 

Leu 

Arg 

Thr 

Lys 

Gin 

Arg 

lie 

Gin 

Pro 

Gin 

Lys 

Thr 






165 





170 





175 




TCT 

GTC 

TAC 

ATT 

GAA 

TTG 

GGA 

TCT 

GAT 

TCT 

TCT 

GAA 

GAT 

ACC 

GTT 

AAT 

695 


Ser 

Val 

Tyr 

lie 

Glu 

Leu 

Gly 

Ser 

Asp 

Ser 

Ser 

Glu 

Asp 

Thr 

Val 

Asn 


40 




180 





185 





190 





AAG 

GCA 

ACT 

TAT 

TGC 

AGT 

GTG 

GGA 

GAT 

CAA 

GAA 

TTG 

TTA 

CAA 

ATC 

ACC 

743 


Lys 

Ala 

Thr 

Tyr 

Cys 

Ser 

Val 

Gly 

Asp 

Gin 

Glu 

Leu 

Leu 

Gin 

lie 

Thr 





195 





200 





205 





45 

CCT 

CAA 

GGA 

ACC 

AGG 

GAT 

GAA 

ATC 

AGT 

TTG 

GAT 

TCT 

GCA 

AAA 

AAG 

GCT 

791 


Pro 

Gin 

Gly 

Thr 

Arg 

Asp 

Glu 

lie 

Ser 

Leu 

Asp 

Ser 

Ala 

Lys 

Lys 

Ala 




210 





215 





220 







GCT 

TGT 

GAA 

TTT 

TCT 

GAG 

ACG 

GAT 

GTA 

ACA 

AAT 

ACT 

GAA 

CAT 

CAT 

CAA 

839 

SO 

Ala 

Cys 

Glu 

Phe 

Ser 

Glu 

Thr 

Asp 

Val 

Thr 

Asn 

Thr 

Glu 

His 

His 

Gin 



225 





230 





235 





240 



CCC 

AGT 

AAT 

AAT 

GAT 

TTG 

AAC 

ACC 

ACT 

GAG 

AAG 

CGT 

GCA 

GCT 

GAG 

AGG 

887 


Pro 

Ser 

Asn 

Asn 

Aep 

Leu 

Asn 

Thr 

Thr 

Glu 

Lys 

Arg 

Ala 

Ala 

Glu 

Arg 


55 





245 





250 





255 




59 
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CAT 

CCA 

GAA 

AAG 

TAT 

CAG 

GGT 

AGT 

TCT 

GTT 

TCA 

AAC 

TTG 

CAT 

GTG 

GAG 

935 

His 

Pro 

Glu 

Lys 

Tyr 

Gin 

Gly 

Ser 

Ser 

Val 

Ser 

Asn 

Leu 

His 

Val 

Glu 





260 





265 





270 




CCA 

TGT 

GGC 

ACA 

AAT 

ACT 

CAT 

GCC 

AGC 

TCA 

TTA 

CAG 

CAT 

GAG 

AAC 

AGC 

983 

Pro 

Cys 

Gly 

Thr 

Asn 

Thr 

His 

Ala 

Ser 

Ser 

Leu 

Gin 

His 

Glu 

Asn 

Ser 




275 





280 





285 





AGT 

TTA 

TTA 

CTC 

ACT 

AAA 

GAC 

AGA 

ATG 

AAT 

GTA 

GAA 

AAG 

GCT 

GAA 

TTC 

1031 

Ser 

Leu 

Leu 

Leu 

Thr 

Lys 

Asp 

Arg 

Met 

Asn 

Val 

Glu 

Lys 

Ala 

Glu 

Phe 



290 





295 





300 






TGT 

AAT 

AAA 

AGC 

AAA 

CAG 

CCT 

GGC 

TTA 

GCA 

AGG 

AGC 

CAA 

CAT 

AAC 

AGA 

1079 

Cys 

Asn 

Lys 

Ser 

Lys 

Gin 

Pro 

Gly 

Leu 

Ala 

Arg 

Ser 

Gin 

His 

Asn 

Arg 


305 





310 





315 





320 

m 

TGG 

GCT 

GGA 

AGT 

AAG 

GAA 

ACA 

TGT 

AAT 

GAT 

AGG 

CGG 

ACT 

CCC 

AGC 

ACA 

1127 

Trp 

Ala 

Gly 

Ser 

Lys 

Glu 

Thr 

Cys 

Asn 

Asp 

Arg 

Arg 

Thr 

Pro 

Ser 

Thr 






325 





330 





335 



GAA 

AAA 

AAG 

GTA 

GAT 

CTG 

AAT 

GCT 

GAT 

CCC 

CTG 

TGT 

GAG 

AGA 

AAA 

GAA 

1175 

Glu 

Lys 

Lys 

Val 

Asp 

Leu 

Asn 

Ala 

Asp 

Pro 

Leu 

Cys 

Glu 

Arg 

Lys 

Glu 





340 





345 





350 




TGG 

AAT 

AAG 

CAG 

AAA 

CTG 

CCA 

TGC 

TCA 

GAG 

AAT 

CCT 

AGA 

GAT 

ACT 

GAA 

1223 

Trp 

Asn 

Lys 

Gin 

Lys 

Leu 

Pro 

Cys 

Ser 

Glu 

Asn 

Pro 

Arg 

Asp 

Thr 

Glu 




355 





360 





365 





GAT 

GTT 

CCT 

TGG 

ATA 

ACA 

CTA 

AAT 

AGC 

AGC 

ATT 

CAG 

AAA 

GTT 

AAT 

GAG 

1271 

Asp 

Val 

Pro 

Trp 

He 

Thr 

Leu 

Asn 

Ser 

Ser 

lie 

Gin 

Lys 

Val 

Asn 

Glu 



370 





375 





380 






TGG 

TTT 

TCC 

AGA 

AGT 

GAT 

GAA 

CTG 

TTA 

GGT 

TCT 

GAT 

GAC 

TCA 

CAT 

GAT 

1319 

Trp 

Phe 

Ser 

Arg 

Ser 

Asp 

Glu 

Leu 

Leu 

Gly 

Ser 

Asp 

Asp 

Ser 

His 

Asp 


385 





390 





395 





400 


GGG 

GAG 

TCT 

GAA 

TCA 

AAT 

GCC 

AAA 

GTA 

GCT 

GAT 

GTA 

TTG 

GAC 

GTT 

CTA 

1367 

Gly 

Glu 

Ser 

Glu 

Ser 

Asn 

Ala 

Lys 

Val 

Ala 

Asp 

Val 

Leu 

Asp 

Val 

Leu 






405 





410 





415 



AAT 

GAG 

GTA 

GAT 

GAA 

TAT 

TCT 

GGT 

TCT 

TCA 

GAG 

AAA 

ATA 

GAC 

TTA 

CTG 

1415 

Asn 

Glu 

Val 

Asp 

Glu 

Tyr 

Ser 

Gly 

Ser 

Ser 

Glu 

Lys 

lie 

Asp 

Leu 

Leu 





420 





425 





430 




GCC 

AGT 

GAT 

CCT 

CAT 

GAG 

GCT 

TTA 

ATA 

TGT 

AAA 

AGT 

GAA 

AGA 

GTT 

CAC 

1463 

Ala 

Ser 

Asp 

Pro 

His 

Glu 

Ala 

Leu 

lie 

Cys 

Lys 

Ser 

Glu 

Arg 

Val 

His 



435 440 445 


TCC 

AAA 

TCA 

GTA 

GAG 

AGT 

AAT 

ATT 

GAA 

GAC 

AAA 

ATA 

TTT 

GGG 

AAA 

ACC 

Ser 

Lys 

Ser 

Val 

Glu 

Ser 

Asn 

lie 

Glu 

Asp 

Lys 

lie 

Phe 

Gly 

Lys 

Thr 


450 





455 





460 





TAT 

CGG 

AAG 

AAG 

GCA 

AGC 

CTC 

CCC 

AAC 

TTA 

AGC 

CAT 

GTA 

ACT 

GAA 

AAT 

Tyr 

Arg 

Lys 

Lys 

Ala 

Ser 

Leu 

Pro 

Asn 

Leu 

Ser 

His 

Val 

Thr 

Glu 

Asn 

465 





470 





475 





480 


60 
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10 


15 


20 


25 


30 


35 


40 


45 


50 


55 


CTA ATT ATA GGA GCA TTT GTT ACT GAG CCA CAG ATA ATA CAA GAG CGT 
Leu lie He Gly Ala Phe Val Thr Glu Pro Gin lie lie Gin Glu Arg 
485 490 495 

CCC CTC ACA AAT AAA TTA AAG CGT AAA AGG AGA CCT ACA TCA GGC CTT 
Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

CAT CCT GAG GAT TTT ATC AAG AAA GCA GAT TTG GCA GTT CAA AAG ACT 
His Pro Glu Asp Phe lie Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

CCT GAA ATG ATA AAT CAG GGA ACT AAC CAA ACG GAG CAG AAT GGT CAA 
Pro Glu Met lie Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

GTG ATG AAT ATT ACT AAT AGT GGT CAT GAG AAT AAA ACA AAA GGT GAT 
Val Met Asn lie Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 

TCT ATT CAG AAT GAG AAA AAT CCT AAC CCA ATA GAA TCA CTC GAA AAA 
Ser lie Gin Asn Glu Lys Asn Pro Asn Pro lie Glu Ser Leu Glu Lys 
565 570 575 

GAA TCT GCT TTC AAA ACG AAA GCT GAA CCT ATA AGC AGC AGT ATA AGO 
Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro lie Ser Ser Ser lie Ser 
580 585 590 

AAT ATG GAA CTC GAA TTA AAT ATC CAC AAT TCA AAA GCA CCT AAA AAG 
Asn Met Glu Leu Glu Leu Asn lie His Asn Ser Lys Ala Pro Lys Lys 
595 600 505 

AAT AGG CTG AGG AGG AAG TCT TCT ACC AGG CAT ATT CAT GCG CTT GAA 
Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His lie His Ala Leu Glu 
610 615 620 

CTA GTA GTC AGT AGA AAT CTA AGC CCA CCT AAT TGT ACT GAA TTG CAA 
Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

ATT GAT AGT TGT TCT AGC AGT GAA GAG ATA AAG AAA AAA AAG TAC AAC 

lie Asp Ser Cys Ser Ser Ser Glu Glu lie Lys Lys Lys Lys Tyr Asn 

645 650 655 

CAA ATG CCA GTC AGG CAC AGC ASA AAC CTA CAA CTC ATG GAA GGT AAA 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 

660 665 670 

GAA CCT GCA ACT GGA GCC AAG AAG AGT AAC AAG CCA AAT GAA CAG ACA 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 

675 680 685 

AGT AAA AGA CAT GAC AGC GAT ACT TTC CCA GAG CTG AAG TTA ACA AAT 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 

690 695 700 


1607 


1655 


1703 


1751 


1799 


1847 


1895 


1943 


1991 


2039 


2087 


2135 


2183 


2231 
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55 


GCA CCT GGT TCT TT7 ACT AAG TGT 
Ala Pro Gly Ser Phe Thr Lys Cys 
705 710 

TTT GTC AAT CCT AGC CTT CCA A3A 
Phe Val Asn Pro Ser Leu Pro Arg 
725 

ACA GTT AAA GTG TCT AAT AAT GCT 
Thr Val Lys Val Ser Asn Asn Ala 
740 

AGT GGA GAA AGG GTT TTG CAA ACT 
Ser Gly Glu Arg Val Leu Gin Thr 
755 760 

ATT TCA TTG GTA CCT GGT ACT GAT 
lie Ser Leu Val Pro Gly Thr Asp 
770 775 

TTA CTG GAA GTT AGC ACT CTA GGG 
Leu Leu Glu Val Ser Thr Leu Gly 
785 790 

TGT GTG AGT CAG TGT GCA GCA TTT 
Cys Val Ser Gin Cys Ala Ala Phe 
805 

GGT TGT TCC AAA GAT AAT AGA AAT 
Gly Cys Ser Lys Asp Asn Arg Asn 
820 

TTG GGA CAT GAA GTT AAC CAC AGT 
Leu Gly His Glu Val Asn His Ser 
835 840 

GAA AGT GAA CTT GAT GCT CAG TAT 
Glu Ser Glu Leu Asp Ala Gin Tyr 
850 855 

AAG CGC CAG TCA TTT GCT CCG TTT 
Lys Arg Gin Ser Phe Ala Pro Phe 
865 870 

GAA TGT GCA ACA TTC TCT GCC CAC 
Glu Cys Ala Thr Phe Ser Ala His 
885 

CCA AAA GTC ACT TTT GAA TGT GAA 
Pro Lys Val Thr Phe Glu Cys Glu 
900 

AAT GAG TCT AAT ATC AAG CCT GTA 
Asn Glu Ser Asn lie Lys Pro Val 
915 920 


a ~t\ AA * ALv A j I’ GAA CTT AAA GAA 

Ser Asn Thr Ser Glu Leu Lys Glu 

7-5 720 

GAA l>AA AAA GAA GAG AAA CTA GAA 

Glu Glu Lys Glu Glu Lys Leu Glu 

730 735 

GAA GAC CCC AAA GAT CTC ATG TTA 

Glu Asp Pro Lys Asp Leu Met Leu 

745 750 

GAA AGA TCT GTA GAG AGT AGC AGT 

Glu Arg Ser Val Glu Ser Ser Ser 

765 

TAT GGC ACT CAG GAA AGT ATC TCG 

Tyr Gly Thr Gin Glu Ser lie Ser 

780 

AAG GCA AAA ACA GAA CCA AAT AAA 

Lys Ala Lys Thr Glu Pro Asn Lys 

795 800 

GAA AAC CCC AAG GGA CTA ATT CAT 

Glu Asn Pro Lys Gly Leu lie His 

810 815 

GAC ACA GAA GGC TTT AAG TAT CCA 

Asp Thr Glu Gly Phe Lys Tyr Pro 

825 830 

CGG GAA ACA AGC ATA GAA ATG GAA 

Arg Glu Thr Ser lie Glu Met Glu 

845 

TTG CAG AAT ACA TTC AAG GTT TCA 

Leu Gin Asn Thr Phe Lys Val Ser 

860 

TCA AAT CCA GGA AAT GCA GAA GAG 

Ser Asn Pro Gly Asn Ala Glu Glu 

875 880 

TCT GGG TCC TTA AAG AAA CAA AGT 

Ser Gly Ser Leu Lys Lys Gin Ser 

890 895 

CAA AAG GAA GAA AAT CAA GGA AAG 

Gin Lys Glu Glu Asn Gin Gly Lys 

905 910 

CAG ACA GTT AAT ATC ACT GCA GGC 

Gin Thr Val Asn He Thr Ala Gly 

925 


2 2 7 9 


2 32 7 


2375 


2423 


2471 


2519 


2567 


2615 


2663 


2711 


2759 


2807 


2855 


2903 
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20 


25 


TTT 

CCT 

GTG 

GTT 

GGT 

CAG 

AAA 

GAT 

AAG 

CCA 

GTT 

GAT 

AAT 

GCC 

AAA 

TGT 

Phe 

Pro 

Val 

Val 

Gly Gin 

Lys 

Asp 

Lys 

Pro 

Val 

Asp 

Asn 

Ala 

Lys 

Cys 


930 





935 





940 





AGT 

ATC 

AAA 

GGA 

GGC 

TCT 

AGG 

TTT 

TGT 

CTA 

TCA 

TCT 

CAG 

TTC 

AGA 

GGC 

Ser 

lie 

Lys 

Gly Gly 

Ser 

Arg 

Phe 

Cys 

Leu 

Ser 

Ser 

Gin 

Phe Arg Gly 

945 





950 





955 





960 

AAC 

GAA 

ACT 

GGA 

CTC 

ATT 

ACT 

CCA 

AAT 

AAA 

CAT 

GGA 

CTT 

TTA 

CAA 

AAC 

Asn 

Glu 

Thr 

Gly 

Leu 

lie 

Thr 

Pro 

Asn 

Lys 

His 

Gly 

Leu 

Leu 

Gin 

Asn 





965 





970 





975 


CCA 

TAT 

CGT 

ATA 

CCA 

CCA 

CTT 

TTT 

CCC 

ATC 

AAG 

TCA 

TTT 

GTT 

AAA 

ACT 

Pro 

Tyr 

Arg 

He 

Pro 

Pro 

Leu 

Phe 

Pro 

lie 

Lys 

Ser 

Phe 

Val 

Lys 

Thr 


980 





985 





990 



AAA 

TGT 

AAG 

AAA 

AAT 

CTG 

CTA 

GAG 

GAA 

AAC 

TTT 

GAG 

GAA 

CAT 

TCA 

ATG 

Lys 

Cys 

Lys 

Lys 

Asn 

Leu 

Leu 

Glu 

Glu 

Asn 

Phe 

Glu 

Glu 

His 

Ser 

Met 

995 





1000 




1005 



TCA 

CCT 

GAA 

AGA 

GAA 

ATG 

GGA 

AAT 

GAG 

AAC 

ATT 

CCA 

AGT 

ACA 

GTG 

AGC 

Ser 

Pro 

Glu 

Arg 

Glu 

Met 

Gly 

Asn 

Glu 

Asn 

He 

Pro 

Ser 

Thr 

Val 

Ser 


1010 




1015 




1020 





ACA ATT AGC CGT AAT AAC ATT AGA GAA AAT GTT TTT AAA GAA GCC AGC 



Thr 

lie 

Ser 

Arg 

Asn 

Asn 

lie 

Arg 

Glu 

Asn 

val 

Phe 

Lys 

Glu 

Ala 

Ser 


1025 




1030 





1035 





1040 

30 

TCA 

AGC 

AAT 

ATT 

AAT 

GAA 

GTA 

GGT 

TCC 

AGT 

ACT 

AAT 

GAA 

GTG 

GGC 

TCC 


Ser 

Ser 

Asn 

lie 

Asn 

Glu 

Val 

Gly S?r 

Ser 

Thr 

Asn 

Glu 

Val 

Gly 

Ser 






1045 




1050 




1055 



AGT 

ATT 

AAT 

GAA 

ATA 

GGT 

TCC 

AGT 

GAT 

GAA 

AAC 

ATT 

CAA 

GCA 

GAA 

CTA 

35 

Ser 

lie 

Asn 

Glu 

lie 

Gly 

Ser 

Ser Asp 

Glu 

Asn 

lie 

Gin 

Ala 

Glu 

Leu 





1060 




1065 




1070 



GGT 

AGA 

AAC 

AGA 

GGG 

CCA 

AAA 

TTG 

AAT 

GCT 

ATG 

CTT 

AGA 

TTA 

GGG 

GTT 


Gly Arg Asn Arg 

Gly Pro 

Lys 

Leu 

Asn 

Ala 

Met 

Leu 

Arg 

Leu 

Gly Val 

40 



1075 




1080 




1085 




TTG 

CAA 

CCT 

GAG 

GTC 

TAT 

AAA 

CAA 

AGT 

CTT 

CCT 

GGA 

AGT 

AAT 

TGT 

AAG 


Leu 

Gin 

Pro 

Glu 

Val 

Tyr 

Lys 

Gin 

Ser 

Leu 

Pro 

Gly Ser 

Asn 

Cys 

Lys 

45 


1090 




1095 




1100 




CAT 

CCT 

GAA 

ATA 

AAA 

AAG 

CAA 

GAA 

TAT 

GAA 

GAA 

GTA 

GTT 

CAG 

ACT 

GTT 


His 

Pro 

Glu 

lie 

Lys 

Lys 

Gin 

Glu Tyr 

Glu 

Glu 

Val 

Val 

Gin 

Thr 

Val 


1105 




1110 




1115 




1120 

50 

AAT 

ACA 

GAT 

TTC 

TCT 

CCA 

TAT 

CTG 

ATT 

TCA 

GAT 

AAC 

TTA 

GAA 

CAG 

CCT 


Asn 

Thr 

Asp 

Phe 

Ser 

Pro 

Tyr 

Leu 

lie 

Ser 

Asp 

Asn 

Leu 

Glu 

Gin 

Pro 





1125 




1130 




1135 


ATG 

GGA 

AGT 

AGT 

CAT 

GCA 

TCT 

CAG 

GTT 

TGT 

TCT 

GAG 

ACA 

CCT 

GAT 

GAC 

55 

Met 

Gly 

Ser 

Ser 

His 

Ala 

Ser 

Gin 

Val 

Cys 

Ser 

Glu 

Thr 

Pro 

Asp 

Asp 


1140 


1145 


2951 


2999 


3047 


3095 


3143 


3191 


3239 


3287 


3335 


3383 


3431 


3479 


3527 


3575 
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CTG 

TTA 

GAT 

GAT 

GGT 

GAA 

ATA 

AAG 

GAA 

GAT 

ACT 

AGT 

TTT 

GCT 

GAA 

AAT 

3623 

Leu 

Leu Asp 

Asp 

Gly 

Glu 

lie 

Lys 

Glu 

Asp 

Thr 

Ser 

Phe 

Ala 

Glu 

Asn 



1155 




1160 




1165 




GAC 

ATT 

AAG 

GAA 

AGT 

TCT 

GCT 

GTT 

TTT 

AGC 

AAA 

AGC 

GTC 

CAG 

AAA 

GGA 

3671 

Asp 

lie 

Lys 

Glu 

Ser 

Ser 

Ala 

Val 

Phe 

Ser 

Lys 

Ser 

Val 

Gin 

Lys 

Gly 



1170 




1175 




1180 



GAG 

CTT 

AGC 

AG G 

AGT 

CCT 

AGC 

CCT 

TTC 

ACC 

CAT 

ACA 

CAT 

TTG 

GCT 

CAG 

3719 

Glu 

Leu 

Ser 

Arg 

Ser 

Pro 

Ser 

Pro 

Phe 

Thr 

His 

Thr 

His 

Leu 

Ala 

Gin 


1185 




1190 




1195 




1200 


GGT 

TAC 

CGA 

AGA 

GGG 

GCC 

AAG 

AAA 

TTA 

GAG 

TCC 

TCA 

GAA 

GAG 

AAC 

TTA 

3767 

Gly Tyr 

Arg 

Arg 

Gly 

Ala 

Lys 

Lys 

Leu 

Glu 

Ser 

Ser 

Glu 

Glu 

Asn 

Leu 






1205 




1210 




1215 


TCT 

AGT 

GAG 

GAT 

GAA 

GAG 

CTT 

CCC 

TGC 

TTC 

CAA 

CAC 

TTG 

TTA 

TTT 

GGT 

3815 

Ser 

Ser 

Glu 

Asp 

Glu 

Glu 

Leu 

Pro 

Cys 

Phe 

Gin 

His 

Leu 

Leu 

Phe 

Gly 





1220 




1225 




1230 


AAA 

GTA 

AAC 

AAT 

ATA 

CCT 

TCT 

CAG 

TCT 

ACT 

AG G 

CAT 

AGC 

ACC 

GTT 

GCT 

3863 

Lys 

Val 

Asn 

Asn 

lie 

Pro 

Ser 

Gin 

Ser 

Thr 

Arg 

His 

Ser 

Thr 

Val 

Ala 




1235 




1240 




1245 




ACC 

GAG 

TGT 

CTG 

TCT 

AAG 

AAC 

ACA 

GAG 

GAG 

AAT 

TTA 

TTA 

TCA 

TTG 

AAG 

3911 

Thr 

Glu 

Cys 

Leu 

Ser 

Lys 

Asn 

Thr 

Glu 

Glu 

Asn 

Leu 

Leu 

Ser 

Leu 

Lys 



1250 




1255 




1260 




AAT 

AGC 

TTA 

AAT 

GAC 

TGC 

AGT 

AAC 

CAG 

GTA 

ATA 

TTG 

GCA 

AAG 

GCA 

TCT 

3959 

Asn 

Ser 

Leu 

Asn 

Asp 

Cys 

Ser 

Asn 

Gin 

Val 

lie 

Leu 

Ala 

Lys 

Ala 

Ser 


1265 




1270 




1275 




1280 


CAG 

GAA 

CAT 

CAC 

CTT 

AGT 

GAG 

GAA 

ACA 

AAA 

TGT 

TCT 

GCT 

AGC 

TTG 

TTT 

4007 

Gin 

Glu 

His 

His 

Leu 

Ser 

Glu 

Glu 

Thr 

Lys 

Cys 

Ser 

Ala 

Ser 

Leu 

Phe 






1285 




1290 




1295 


TCT 

TCA 

CAG 

TGC 

AGT 

GAA 

TTG 

GAA 

GAC 

TTG 

ACT 

GCA 

AAT 

ACA 

AAC 

ACC 

4055 

Ser 

Ser 

Gin 

Cys 

Ser 

Glu 

Leu 

Glu 

Asp 

Leu 

Thr 

Ala 

Asn 

Thr 

Asn 

Thr 



1300 1305 1310 


CAG 

GAT 

CCT 

TTC 

TTG 

ATT GGT 

TCT 

TCC 

AAA 

CAA 

ATG 

AGG 

CAT 

CAG 

TCT 

4103 

Gin 

Asp 

Pro 

Phe 

Leu 

lie Gly 

Ser 

Ser 

Lys 

Gin 

Met 

Arg 

His 

Gin 

Ser 




1315 



1320 




1325 




GAA 

AGC 

CAG 

GGA 

GTT 

GGT CTG 

AGT 

GAC 

AAG 

GAA 

TTG 

GTT 

TCA 

GAT 

GAT 

4151 

Glu 

Ser 

Gin 

Gly 

Val 

Gly Leu 

Ser 

Asp 

Lys 

Glu 

Leu 

Val 

Ser 

Asp 

Asp 



1330 



1335 




1340 





GAA 

GAA 

AGA 

GGA 

ACG 

GGC TTG 

GAA 

GAA 

AAT 

AAT 

CAA 

GAA 

GAG 

CAA 

AGC 

4199 

Glu 

Glu 

Arg 

Gly 

Thr 

Gly Leu 

Glu 

Glu 

Asn 

Asn 

Gin 

Glu 

Glu 

Gin 

Ser 


1345 




1350 




1355 




1360 


ATG 

GAT 

TCA 

AAC 

TTA 

GGT GAA 

GCA 

GCA 

TCT 

GGG 

TGT 

GAG 

AGT 

GAA 

ACA 

4247 

Met 

Asp 

Ser 

Asn 

Leu 

Gly Glu 

Ala 

Ala 

Ser 

Gly Cys 

Glu 

Ser 

Glu 

Thr 






1365 



1370 




1375 
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AGC 

GTC 

TCT 

GAA 

GAC 

TGC 

TCA 

GGG 

CTA 

TCC 

TCT 

CAG 

AGT 

GAC 

ATT 

TTA 

4295 


Ser 

Val 

Ser 

Glu 

Asp 

Cys 

Ser 

Gly 

Leu 

Ser 

Ser 

Gin 

Ser 

Asp 

lie 

Leu 






1390 





1385 





1390 




5 

ACC 

ACT 

CAG 

CAG 

AGG 

GAT 

ACC 

ATG 

CAA 

CAT 

AAC 

CTG 

ATA 

AAG 

CTC 

CAG 

4343 


Thr 

Thr 

Gin 

Gin 

Arg 

Asp 

Thr 

Met 

Gin 

His 

Asn 

Leu 

lie 

Lys 

Leu 

Gin 





1395 





14 CO 




1405 






CAG 

GAA 

ATG 

GCT 

GAA 

CTA 

GAA 

GCT 

GTG 

TTA 

GAA 

CAG 

CAT 

GGG 

AGC 

CAG 

4391 

W 

Gin 

Glu 

Met 

Ala 

Glu 

Leu 

Glu 

Ala 

Val 

Leu 

Glu 

Gin 

His 

Gly Ser 

Gin 




1410 




1415 




1420 






CCT 

TCT 

AAC 

AGC 

TAC 

CCT 

TCC 

ATC 

ATA 

AGT 

GAC 

TCT 

TCT 

GCC 

CTT 

GAG 

4439 


Pro 

Ser 

Asn 

Ser 

Tyr 

Pro 

Ser 

lie 

He 

Ser 

Asp 

Ser 

Ser 

Ala 

Leu 

Glu 


15 

1425 




1430 




1435 




1440 

* 


GAC 

CTG 

CGA 

AAT 

CCA 

GAA 

CAA 

AGC 

ACA 

TCA 

GAA 

AAA 

GCA 

GTA 

TTA 

ACT 

4487 


Asp 

Leu 

Arg 

Asn 

Pro 

Glu 

Gin 

Ser 

Thr 

ser 

GlU 

Lys 

Ala 

Val 

Leu 

Thr 


20 




1445 




1450 




1455 



TCA 

CAG 

AAA 

AGT 

AGT 

GAA 

TAC 

CCT 

ATA 

AGC 

CAG 

AAT 

CCA 

GAA 

GGC 

CTT 

4535 


Ser 

Gin 

Lys 

Ser 

Ser 

Glu 

Tyr 

Pro 

lie 

Ser 

Gin 

Asn 

Pro 

Glu Gly 

Leu 






1460 




1465 




1470 



25 

TCT 

GCT 

GAC 

AAG 

ITT 

GAG 

GTG 

TCT 

GCA 

GAT 

AGT 

TCT 

ACC 

AGT 

AAA 

AAT 

4583 


Ser 

Ala 

Asp 

Lys 

Phe 

Glu 

Val 

Ser 

Ala 

Asp 

Ser 

Ser 

Thr 

Ser 

Lys 

Asn 





1475 




1480 




1485 





AAA 

GAA 

CCA 

GGA 

GTG 

GAA 

AGG 

TCA 

TCC 

CCT 

TCT 

AAA 

TGC 

CCA 

TCA 

TTA 

4631 

30 

Lys 

GlU 

Pro Gly 

Val 

GlU 

Arg 

Ser 

Ser 

Pro 

Ser 

Lys 

Cys 

Pro 

Ser 

Leu 




1490 




1495 




1500 






GAT 

GAT 

AGG 

TGG 

TAC 

ATG 

CAC 

AGT 

TGC 

TCT 

GGG 

AGT 

CTT 

CAG 

AAT 

AGA 

4679 


Asp 

Asp Arg 

Trp 

Tyr 

Met 

His 

Ser 

Cys 

Ser 

Gly 

Ser 

Leu 

Gin 

Asn 

Arg 


35 

1505 




1510 




1515 




1520 



AAC 

TAC 

CCA 

TCT 

CAA 

GAG 

GAG 

CTC 

ATT 

AAG 

GTT 

GTT 

GAT 

GTG 

GAG 

GAG 

4727 


Asn Tyr 

Pro 

Ser 

Gin 

Glu 

Glu 

Leu 

lie 

Lys 

Val 

Val 

Asp 

Val 

Glu 

Glu 


40 





1525 




1530 




1535 


CAA 

CAG 

CTG 

GAA 

GAG 

TCT 

GGG 

CCA 

CAC 

GAT 

TTG 

ACG 

GAA 

ACA 

TCT 

TAC 

4775 


Gin 

Gin 

Leu 

Glu 

Glu 

Ser 

Gly Pro 

His 

Asp Leu 

Thr 

Glu 

Thr 

Ser 

Tyr 






1540 




1545 




1550 



45 

TTG 

CCA 

AGG 

CAA 

GAT 

CTA 

GAG 

GGA 

ACC 

CCT 

TAC 

CTG 

GAA 

TCT 

GGA 

ATC 

4823 


Leu 

Pro 

Arg 

Gin 

Asp 

Leu 

Glu 

Gly Thr 

Pro Tyr 

Leu 

Glu 

Ser 

Gly 

lie 





1555 




1560 




1565 





AGC 

CTC 

TTC 

TCT 

GAT 

GAC 

CCT 

GAA 

TCT 

GAT 

CCT 

TCT 

GAA 

GAC 

AGA 

GCC 

4871 

50 

Ser 

Leu 

Phe 

Ser 

Asp 

Asp 

Pro 

Glu 

Ser 

Asp 

Pro 

ser 

Glu 

Asp 

Arg 

Ala 




1570 




1575 




1580 






CCA 

GAG 

TCA 

GCT 

CGT 

GTT 

' GGC 

AAC 

ATA 

. CCA 

, TCT 

TCA 

. ACC 

TCT 

GCA 

TTG 

4919 


Pro 

Glu 

Ser 

Ala 

Arg 

Val 

Gly Asn 

lie 

Pro 

Ser 

Ser 

Thr 

Ser 

Ala 

Leu 



55 1535 1590 1595 1600 
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50 


AAA GTT CCC CAA TTG AAA GTT GCA GAA TCT GCC CAG AGT CCA GCT GCT 

Lys val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 

1605 1610 1615 

GCT CAT ACT ACT GAT ACT GCT GGG TAT AAT GCA ATG GAA GAA AGT GTG 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 

1620 1625 1630 

AGC AGG GAG AAG CCA GAA TTG ACA GCT TCA ACA GAA AGG GTC AAC AAA 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 

1635 1640 1645 

AGA ATG TCC ATG GTG GTG TCT GGC CTG ACC CCA GAA GAA TTT ATG CTC 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 

1650 1655 1660 

GTG TAC AAG TTT GCC AGA AAA CAC CAC ATC ACT TTA ACT AAT CTA ATT 

Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu lie 

1665 1670 1675 1680 

ACT GAA GAG ACT ACT CAT GTT GTT ATG AAA ACA GAT GCT GAG TTT GTG 
Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 
1685 1690 1695 

TGT GAA CGG ACA CTG AAA TAT TTT CTA GGA ATT GCG GGA GGA AAA TGG 
Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly He Ala Gly Gly Lys Trp 
1700 1705 1710 

GTA GTT AGC TAT TTC TGG GTG ACC CAG TCT ATT AAA GAA AGA AAA ATG 
Val Val Ser Tyr Phe Trp Val Thr Gin Ser lie Lys Glu Arg Lys Met 
1715 1720 1725 

CTG AAT GAG CAT GAT TTT GAA GTC AGA GGA GAT GTG GTC AAT GGA AGA 
Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

AAC CAC CAA GGT CCA AAG CGA GCA AGA GAA TCC CAG GAC AGA AAG ATC 
Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
1745 1750 1755 1760 

TTC AGG GGG CTA GAA ATC TGT TGC TAT GGG CCC TTC ACC AAC ATG CCC 
Phe Arg Gly Leu Glu lie Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 
1765 1770 1775 

ACA GAT CAA CTG GAA TGG ATG GTA CAG CTG TGT GGT GCT TCT GTG GTG 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 

1780 1785 1790 

AAG GAG CTT TCA TCA TTC ACC CTT GGC ACA GGT GTC CAC CCA ATT GTG 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro lie Val 

1795 1800 1805 

GTT GTG CAG CCA GAT GCC TGG ACA GAG GAC AAT GGC TTC CAT GCA ATT 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala lie 

1810 1815 1820 


4967 


5015 


5063 


5111 


5159 


5207 


5255 


5303 


5351 


5399 


5447 


5495 


5543 


5591 
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GGG 

CAG 

ATG 

TGT 

GAG 

GCA 

CCT 

GTG 

GTG 

ACC 

CGA GAG 

TGG 

GTG 

TTG 

GAC 

5639 

Gly Gin 

Met 

Cys 

G*u 

Ala 

Pro 

Val 

Val 

Thr 

Arg Glu 

Trp 

Val 

Leu 

Asp 


1825 




1830 




1835 




1840 


AGT 

GTA 

GCA 

CTC 

TAC 

CAG 

TGC 

CAG 

GAG 

CTG 

GAC ACC 

TAC 

CTG 

ATA 

CCC 

5687 

Ser 

Val 

Ala 

Leu 

Tyr 

Gin 

Cys 

Gin 

Glu 

Leu 

Asp Thr 

Tyr 

Leu 

lie 

Pro 






1845 




1850 



1855 


CAG 

ATC 

CCC 

CAC 

AGO 

CAC 

TAC 

TGA 

CTGCAGCCAG CCACAGGTAC AGAGCCACAG 

5741 

Gin 

He 

Pro 

His 

Ser 

His 

Tyr 

* 












1860 













GACCCCAAGA 

ATGAGCTTAC 

AAAGTGGCCT 

TTCCAGGCCC 

TGGGAGCTCC 

TCTCACTCTT 

5801 

CAGTCCTTCT 

ACTGTCCTGG 

CTACTAAATA 

TTTTATGTAC 

ATCAGCCTGA 

AAAGGACTTC • 

5861 

TGGCTATGCA 

AGGGTCCCTT 

AAAGATTTTC 

TGCTTGAAGT 

CTCCCTTGGA 

AAT 

5914 


20 


25 


30 


35 


40 


45 


50 


55 


(2) INFORMATION FOR SEQ ID NO:2: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1864 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE 

TYPE: protein 









(xi) SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO:; 

> . 





Met 

Asp 

Leu 

Ser 

Ala 

Leu 

Arg 

Val 

Glu 

Glu 

Val 

Gin 

Asn 

Val 

lie 

Asn 

1 




5 





10 





15 


Ala 

Met 

Gin 

Lys 

lie 

Leu 

Glu 

Cys 

Pro 

lie 

Cys 

Leu 

Glu 

Leu 

lie 

Lys 




20 





25 





30 



Glu 

Pro 

Val 

Ser 

Thr 

Lys 

Cys 

Asp His 

lie 

Phe 

Cys 

Lys 

Phe 

Cys Met 



35 





40 





45 




Leu 

Lys 

Leu 

Leu 

Asn 

Gin 

Lys 

Lys Gly 

Pro 

Ser 

Gin 

Cys 

Pro 

Leu Cys 


50 





55 





60 





Lys 

Asn 

Asp 

lie 

Thr 

Lys 

Arg 

Ser 

Leu 

Gin 

Glu 

Ser 

Thr 

Arg 

Phe 

Ser 

65 





70 





75 





80 

Gin 

Leu 

Val 

Glu 

Glu 

Leu 

Leu 

Lys 

He 

lie 

Cys 

Ala 

Phe 

Gin 

Leu 

Asp 





85 





90 





95 


Thr 

Gly 

Leu 

Glu 

Tyr 

Ala 

Asn 

Ser 

Tyr 

Asn 

Phe 

Ala 

Lys 

Lys 

Glu 

Asn 




100 





105 





110 



Asn 

Ser 

Pro 

Glu 

His 

Leu 

Lys 

Asp 

Glu 

Val 

Ser 

lie 

I le 

Gin 

Ser 

Met 



115 





120 





125 




Gly Tyr 

Arg 

Asn 

Arg 

Ala 

Lys 

Arg 

Leu 

Leu 

Gin 

Ser 

Glu 

Pro 

Glu 

Asn 


130 





135 





140 
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30 


35 


40 


45 


50 


55 


Pro Ser Leu Gin Glu Thr Ser Leu 
145 150 

Thr Val Arg Thr Leu Arg Thr Lys 
165 

Ser Val Tyr lie Glu Leu Gly Ser 
180 

Lys Ala Thr Tyr Cys Ser Val Gly 
195 200 

Pro Gin Gly Thr Arg Asp Glu lie 
210 215 

Ala Cys Glu Phe Ser Glu Thr Asp 
225 230 

Pro Ser Asn Asn Asp Leu Asn Thr 
245 

His Pro Glu Lys Tyr Gin Gly Ser 
260 

Pro Cys Gly Thr Asn Thr His Ala 
275 280 

Ser Leu Leu Leu Thr Lys Asp Arg 
290 295 

Cys Asn Lys Ser Lys Gin Pro Gly 
305 310 

Trp Ala Gly Ser Lys Glu Thr Cys 
325 

Glu Lys Lys Val Asp Leu Asn Ala 
340 

Trp Asn Lys Gin Lys Leu Pro Cys 
355 360 

Asp Val Pro Trp He Thr Leu Asn 
370 375 

Trp Phe Ser Arg Ser Asp Glu Leu 
385 390 

Gly Glu Ser Glu Ser Asn Ala Lys 
405 

Asn Glu Val Asp Glu Tyr Ser Gly 
420 


Ser Val Gin Leu Ser Asn Leu Gly 
155 160 

Gin Arg lie Gin Pro Gin Lys Thr 
170 175 

Asp Ser Ser Glu Asp Thr Val Asn 
185 190 

Asp Gin Glu Leu Leu Gin lie Thr 
205 

Ser Leu Asp Ser Ala Lys Lys Ala 
220 

Val Thr Asn Thr Glu His His Gin 
235 240 

Thr Glu Lys Arg Ala Ala Glu Arg 
250 255 

Ser Val Ser Asn Leu His Val Glu 
265 270 

Ser Ser Leu Gin His Glu Asn Ser 
285 

Met Asn Val Glu Lys Ala Glu Phe 
300 

Leu Ala Arg Ser Gin His Asn Arg 
315 320 

Asn Asp Arg Arg Thr Pro Ser Thr 
330 335 

Asp Pro Leu Cys Glu Arg Lys Glu 
345 350 

Ser Glu Asn Pro Arg Asp Thr Glu 
365 

Ser Ser lie Gin Lys Val Asn Glu 
380 

Leu Gly Ser Asp Asp Ser His Asp 
395 400 

Val Ala Asp Val Leu Asp Val Leu 
410 415 

Ser Ser Glu Lys lie Asp Leu Leu 
425 430 
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Ala 

Ser 

Asp 

Pro 

His 

Glu 

Ala 

Leu 

lie 

Cys 

Lys 

Ser 

Glu 

Arg 

Val 

His 




435 





440 





445 




5 

Ser 

Lys 

Ser 

Val 

Glu 

Ser 

Asn 

lie 

Glu 

Asp 

Lys 

lie 

Phe 

Gly 

Lys 

Thr 



450 





455 





460 






Tyr 

Arg 

Lys 

Lys 

Ala 

Ser 

Leu 

Pro 

Asn 

Leu 

Ser 

His 

Val 

Thr 

Glu 

Asn 


465 





470 





475 





480 

W 

Leu 

lie 

lie 

Gly 

Ala 

Phe 

Val 

Thr 

Glu 

Pro 

Gin 

lie 

lie 

Gin 

Glu 

Arg 






485 





490 





495 



Pro 

Leu 

Thr 

Asn 

Lys 

Leu 

Lys 

Arg 

Lys 

Arg 

Arg 

Pro 

Thr 

Ser 

Gly Leu 

15 




500 





505 





510 




His 

Pro 

Glu 

Asp 

Phe 

lie 

Lys 

Lys 

Ala 

Asp 

Leu 

Ala 

Val 

Gin 

Lys 

Thr 




515 





520 





525 




20 

Pro 

Glu 

Met 

lie 

Asn 

Gin 

Gly 

Thr 

Asn 

Gin 

Thr 

Glu 

Gin 

Asn 

Gly Gin 


530 





535 





540 






Val 

Met 

Asn 

lie 

Thr 

Asn 

Ser Gly 

HIS 

Glu 

Asn 

Lys Thr 

Lys 

Gly Asp 


545 





550 





555 





560 

25 

Ser 

He 

Gin 

Asn 

Glu 

Lys 

Asn 

Pro 

Asn 

Pro 

lie 

Glu 

Ser 

Leu 

Glu 

Lys 






565 





570 





575 



Glu 

Ser 

Ala 

Phe 

Lys 

Thr 

Lys 

Ala 

Glu 

Pro 

lie 

Ser 

Ser 

Ser 

lie 

Ser 





580 





585 





590 



30 

Asn 

Met 

Glu 

Leu 

Glu 

Leu 

Asn 

lie 

His 

Asn 

Ser 

Lys 

Ala 

Pro 

Lys 

Lys 




595 





600 





605 





Asn 

Arg 

Leu 

Arg 

Arg 

Lys 

Ser 

Ser 

Thr 

Arg 

His 

lie 

His 

Ala 

Leu 

Glu 

35 


610 





615 





620 






Leu 

Val 

Val 

Ser 

Arg 

Asn 

Leu 

Ser 

Pro 

Pro 

Asn 

Cys 

Thr 

Glu 

Leu 

Gin 


625 





630 





635 





640 

40 

lie 

ASp 

Ser 

Cys 

Ser 

Ser 

Ser 

Glu 

Glu 

lie 

Lys 

Lys 

Lys 

Lys 

Tyr 

Asn 






645 





650 





£ 55 



Gin 

Met 

Pro 

Val 

Arg 

His 

Ser 

Arg 

Asn 

Leu 

Gin 

Leu 

Met 

Glu 

Gly Lys 





660 





665 





670 



45 

Glu 

Pro 

Ala 

Thr 

Gly 

Ala 

Lys 

Lys 

Ser 

Asn 

Lys 

Pro 

Asn 

Glu 

Gin 

Thr 




675 





680 





685 





Ser 

Lys 

Arg 

His 

Asp 

Ser 

Asp 

Thr 

Phe 

Pro 

Glu 

Leu 

Lys 

Leu 

Thr 

Asn 

50 


690 





695 





700 






Ala 

Pro 

Gly 

Ser 

Phe 

Thr 

Lys 

Cys 

Ser 

Asn 

Thr 

ser 

Glu 

Leu 

Lys 

Glu 


705 





710 





715 





720 

55 

Phe 

Val 

Asn 

Pro 

Ser 

Leu 

Pro 

Arg 

Glu 

Glu 

Lys 

Glu 

Glu 

Lys 

Leu 

Glu 


725 730 735 
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Thr 

Val 

Lys 

Val 

Ser 

Asn 

Asn 

Ala 

Glu 

Asp 

Pro 

Lys 

Asp 

Leu 

Met 

Leu 





740 





745 





750 




Ser 

Gly Glu 

Arg Val 

Leu 

Gin 

Thr 

Glu 

Arg 

Ser 

Val 

Glu 

Ser 

Ser 

Ser 




755 





760 





765 





He 

Ser 

Leu 

Val 

Pro 

Gly 

Thr 

Asp Tyr Gly 

Thr 

Gin 

Glu 

Ser 

lie 

Ser 



770 





775 





780 





10 

Leu 

Leu 

Glu 

Val 

Ser 

Thr 

Leu Gly Lys 

Ala 

Lys 

Thr 

Glu 

Pro 

Asn 

Lys 

785 





790 





795 





800 


Cys 

Val 

Ser 

Gin 

Cys 

Ala 

Ala 

Phe 

Glu 

Asn 

Pro 

Lys Gly Leu 

lie 

His 






805 





810 





815 


15 

Gly 

Cys 

Ser 

Lys 

Asp 

Asn 

Arg Asn Asp 

Thr 

Glu Gly 

Phe 

Lys 

Tyr 

Pro 





820 





825 





830 




Leu 

Gly His 

Glu 

Val 

Asn 

His 

Ser Arg 

Glu 

Thr 

Ser 

lie 

Glu 

Met 

Glu 

20 



835 





840 





845 





Gill 

Ser 

Glu 

Leu 

Asp 

Ala 

Gin 

Tyr 

Leu 

Gin 

Asn 

Thr 

Phe 

Lys 

Val 

Ser 



850 





855 





860 





25 

Lys 

Arg 

Gin 

Ser 

Phe 

Ala 

Pro 

Phe 

Ser 

Asn 

Pro Gly 

Asn 

Ala 

Glu 

Glu 

865 





870 





875 





880 


Glu 

Cys 

Ala 

Thr 

Phe 

Ser 

Ala 

His 

Ser Gly 

Ser 

Leu 

Lys 

Lys 

Gin 

Ser 






885 





890 





895 


30 

Pro 

Lys 

Val 

Thr 

Phe 

Glu 

Cys 

Glu 

Gin 

Lys 

Glu 

Glu 

Asn 

Gin 

Gly Lys 





900 





905 





910 




Asn 

Glu 

Ser 

Asn 

He 

Lys 

Pro 

Val 

Gin 

Thr 

Val 

Asn 

lie 

Thr 

Ala 

Gly 

35 



915 





920 





925 





Phe 

Pro 

Val 

Val 

Gly Gin 

Lys Asp 

Lys 

Pro 

Val 

Asp Asn 

Ala 

Lys 

Cys 



930 





935 





940 





40 

Ser 

lie 

Lys 

Gly Gly Ser Arg 

Phe 

Cys 

Leu 

Ser 

Ser 

Gin 

Phe 

Arg Gly 

945 





950 





955 





960 


Asn 

Glu 

Thr 

Gly Leu 

lie 

Thr 

Pro 

Asn 

Lys 

His 

Gly Leu 

Leu 

Gin 

Asn 






965 





970 





975 


45 

Pro 

Tyr 

Arg 

lie 

Pro 

Pro 

Leu 

Phe 

Pro 

lie 

Lys 

Ser 

Phe 

Val 

Lys 

Thr 





980 





985 





990 




Lys 

Cys 

Lys 

Lys 

Asn 

Leu 

Leu 

Glu 

Glu 

Asn 

Phe 

Glu 

Glu 

His 

Ser 

Met 

50 



995 





1000 




1005 




Ser 

Pro 

Glu 

Arg 

Glu 

Met 

Gly Asn Glu 

Asn 

lie 

Pro 

Ser 

Thr 

Val 

Ser 



1010 




1015 




1020 




55 

Thr 

lie 

Ser 

Arg 

Asn 

Asn 

lie 

Arg 

Glu 

Asn 

Val 

Phe 

Lys 

Glu 

Ala 

Ser 


1025 1030 1035 1040 
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’5 
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30 


35 


40 


45 
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Ser 

Ser 

Asn 

He 

Asn 

Glu 

Val 

Gly Ser 

Ser 

Thr 

Asn 

Glu 

Val 

Gly Ser 





1045 




1050 




1055 

Ser 

lie 

Asn 

Glu 

lie 

Gly 

Ser 

Ser 

Asp 

Glu 

Asn 

lie 

Gin 

Ala 

Glu 

Leu 




1060 




1065 




1070 


Gly Arg 

Asn 

Arg Gly 

Pro 

Lys 

Leu 

Asn 

Ala 

Met 

Leu 

Arg 

Leu 

Gly 

Val 



1075 




1080 




1085 



Leu 

Gin 

Pro 

Glu 

Val 

Tyr 

Lys 

Gin 

Ser 

Leu 

Pro 

Gly 

Ser 

Asn 

Cys 

Lys 


1090 




1095 




1100 




His 

Pro 

Glu 

lie 

Lys 

Lys 

Gin 

Glu 

Tyr 

Glu 

Glu 

Val 

Val 

Gin 

Thr 

Val 

1105 




1110 




1115 




1120 

Asn 

Thr 

Asp 

Phe 

Ser 

Pro 

Tyr 

Leu 

lie 

Ser 

Asp 

Asn 

Leu 

Glu 

Gin 

Pro 





1125 




1130 




1135 

Met 

Gly Ser 

Ser 

His 

Ala 

Ser 

Gin 

Val 

Cys 

Ser 

Glu 

Thr 

Pro Asp 

Asp 




1140 




1145 




1150 


Leu 

Leu 

Asp 

Asp 

Gly Glu 

lie 

Lys 

Glu Asp 

Thr 

Ser 

Phe 

Ala 

Glu 

Asn 



1155 




1160 




1165 



Asp 

He 

Lys 

Glu 

Ser 

Ser 

Ala 

Val 

Phe 

Ser 

Lys 

Ser 

Val 

Gin 

Lys 

Gly 


1170 




1175 




1180 




Glu 

Leu 

Ser 

Arg 

Ser 

Pro 

Ser 

Pro 

Phe 

Thr 

His 

Thr 

His 

Leu 

Ala 

Gin 

1185 




1190 




1195 




1200 

Gly 

Tyr 

Arg 

Arg 

Gly Ala 

Lys 

Lys 

Leu 

Glu 

Ser 

Ser 

Glu 

Glu 

Asn 

Leu 





1205 




1210 




1215 

Ser 

Ser 

Glu 

Asp 

Glu 

Glu 

Leu 

Pro 

Cys 

Phe 

Gin 

His 

Leu 

Leu 

Phe Gly 




1220 




1225 




1230 


Lys 

Val 

Asn 

Asn 

lie 

Pro 

Ser 

Gin 

Ser 

Thr 

Arg 

His 

Ser 

Thr 

Val 

Ala 



1235 




1240 




1245 



Thr 

Glu 

Cys 

Leu 

Ser 

Lys 

Asn 

Thr 

Glu 

Glu 

Asn 

Leu 

Leu 

Ser 

Leu 

Lys 


1250 




1255 




1260 




Asn 

Ser 

Leu 

Asn 

Asp 

Cys 

Ser 

Asn 

Gin 

Val 

lie 

Leu 

Ala 

Lys 

Ala 

Ser 

1265 




1270 




1275 




1280 

Gin 

Glu 

His 

His 

Leu 

Ser 

Glu 

Glu 

Thr 

Lys 

Cys 

Ser 

Ala 

Ser 

Leu 

Phe 





1285 




1290 




1295 

Ser 

Ser 

Gin 

Cys 

Ser 

Glu 

Leu 

Glu 

Asp 

Leu 

Thr 

Ala 

Asn 

Thr 

Asn 

Thr 




1300 




1305 




1310 


Gin 

Asp 

Pro 

Phe 

Leu 

lie Gly 

Ser 

Ser 

Lys 

Gin 

Met 

Arg 

His 

Gin 

Ser 



1315 




1320 




1325 



Glu 

Ser 

Gin 

Gly 

Val 

Gly Leu 

Ser Asp 

Lys 

Glu 

Leu 

Val 

Ser 

Asp 

Asp 


1330 1335 1340 
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Glu 

Glu 

Arg 

Gly Thr Gly 

Leu 

Glu 

Glu 

Asn 

Asn 

Gin 

Glu 

Glu 

Gin Ser 


1345 




1350 




1355 



1360 

5 

Met 

Asp 

Ser 

Asn 

Leu Gly 

Glu 

Ala 

Ala 

Ser Gly 

Cys 

Glu 

Ser 

Glu Thr 






1365 




1370 




1375 


Ser 

Val 

Ser 

Glu Asp 

Cys 

Ser 

Gly 

Leu 

Ser 

Ser 

Gin 

Ser 

Asp 

lie Leu 

10 




1380 




1385 




1390 

Thr 

Thr 

Gin 

Gin 

Arg 

Asp 

Thr 

Met 

Gin 

His 

Asn 

Leu 

lie 

Lys 

Leu Gin 




1395 




1400 




1405 



Gin 

Glu 

Met 

Ala 

Glu 

Leu 

Glu 

Ala 

Val 

Leu 

Glu 

Glu 

His 

Gly Ser Gin 

IS 


1410 




1415 




1420 




Pro 

Ser 

Asn 

Ser 

Tyr 

Pro 

Ser 

He 

lie 

Ser 

Asp 

Ser 

Ser 

Ala 

Leu Glu 


1425 




1430 




1435 



1440 

20 

Asp 

Leu 

Arg 

Asn 

Pro 

Glu 

Gin 

Ser 

Thr 

Ser 

Glu 

Lys 

Ala 

Val 

Leu Thr 





1445 




1450 




1455 


Ser 

Gin 

Lys 

Ser 

Ser 

Glu 

Tyr 

Pro 

lie 

Ser 

Gin 

Asn 

Pro 

Glu Gly Leu 





1460 




1465 




1470 

25 

Ser 

Ala 

Asp 

Lys 

Phe 

Glu 

Val 

Ser 

Ala 

Asp 

Ser 

Ser 

Thr 

Ser 

Lys Asn 




1475 




1480 




14 8 5 



Lys 

Glu 

Pro 

Gly 

Val 

Glu 

Arg 

Ser 

Ser 

Pro 

Ser 

Lys 

Cys 

Pro 

Ser Leu 

30 


1490 




1495 




1500 




Asp 

Asp 

Arg 

Trp 

Tyr 

Met 

His 

Ser 

Cys 

Ser 

Gly 

Ser 

Leu 

Gin 

Asn Arg 


1505 




1510 




1515 



1520 


Asn 

Tyr 

Pro 

Ser 

Gin 

Glu 

Glu 

Leu 

lie 

Lys 

Val 

Val 

Asp 

Val 

Glu Glu 

35 





1525 




1530 




1535 


Gin 

Gin 

Leu 

Glu 

Glu 

Ser 

Gly 

Pro 

His 

Asp 

Leu 

Thr 

Glu 

Thr 

Ser Tyr 





1540 




1545 




1550 

40 

Leu 

Pro 

Arg 

Gin 

Asp 

Leu 

Glu 

Gly 

Thr 

Pro Tyr 

Leu 

Glu 

Ser 

Gly lie 




1555 




1560 




1565 



Ser 

Leu 

Phe 

Ser 

Asp 

Asp 

Pro 

Glu 

Ser 

Asp Pro 

Ser 

Glu Asp Arg Ala 



1570 




1575 




1580 



45 

Pro 

Glu 

Ser 

Ala 

Arg 

Val 

Gly Asn 

lie 

Pro 

Ser 

Ser 

Thr 

Ser 

Ala Leu 


1585 1590 1595 1600 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 
1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 
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IS 


20 


25 


30 


35 


40 


45 


50 
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Arg 

Met 

Ser 

Met 

Val 

Val 

Ser 

Gly 

Leu 

Thr 

Pro 

G iu 

Glu 

Phe 

Met 

Leu 


1650 




1655 




1660 




Val 

Tyr 

Lys 

Phe 

Ala 

Arg 

Lys 

His 

His 

lie 

Thr 

Leu 

Thr 

Asn 

Leu 

lie 

1665 




1670 




1675 




1680 

Thr 

Glu 

Glu 

Thr 

Thr 

His 

Val 

Val 

Met 

Lys 

Thr 

Asp 

Ala 

Glu 

Phe 

Val 





1685 




1690 




1695 

Cys 

Glu 

Arg 

Thr 

Leu 

Lys 

Tyr 

Phe 

Leu Gly 

He 

Ala 

Gly 

Gly 

Lys 

Trp 




1700 




1705 




1710 


Val 

Val 

Ser 

Tyr 

Phe 

Trp 

Val 

Thr 

Gin 

Ser 

lie 

Lys 

Glu 

Arg 

Lys 

Met 



171! 

> 




1720 




1725 



Leu 

Asn 

Glu 

His 

Asp 

Phe 

Glu 

Val 

Arg 

Gly 

Asp 

Val 

Val 

Asn 

Gly 

Arg 


1730 




1735 




1740 




Asn 

His 

Gin 

Gly Pro 

Lys 

Arg 

Ala 

Arg 

Glu 

Ser 

Gin 

Asp 

Arg 

Lys 

lie 

1745 




1750 




1755 




1760 

Phe 

Arg 

Gly 

Leu 

Glu 

lie 

Cys 

Cys 

Tyr 

Gly 

Pro 

Phe 

Thr 

Asn 

Met 

Pro 





1765 




1770 




1775 

Thr 

Asp 

Gin 

Leu 

Glu 

Trp 

Met 

Val 

Gin 

Leu 

Cys 

Gly 

Ala 

Ser 

Val 

Val 




1780 




1785 




1790 


Lys 

Glu 

Leu 

Ser 

Ser 

Phe 

Thr 

Leu 

Gly Thr 

Gly 

Val 

His 

Pro 

lie 

Val 



1795 




1800 




1805 



Val 

Val 

Gin 

Pro 

Asp 

Ala 

Trp 

Thr 

Glu 

Asp 

Asn 

Gly 

Phe 

His 

Ala 

lie 


1810 




1815 




1820 




Gly Gin 

Met 

Cys 

Glu 

Ala 

Pro 

Val 

Val 

Thr 

Arg 

Glu 

Trp Val 

Leu 

Asp 

1825 




1830 




1835 




1840 

Ser 

Val 

Ala 

Leu 

Tyr 

Gin Cys 

Gin 

Glu 

Leu 

Asp Thr 

Tyr 

Leu 

lie 

Pro 





1845 




1850 




1855 

Gin 

lie 

Pro 

His 

Ser 

His 

Tyr 

* 












1860 













(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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[Vii ) IMMEDIATE SOURCE: 

(B) CLONE : s754 A 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CTAGCCTGGG CAACAAACGA 

(2) INFORMATION FOR SEQ ID NO: 4: 

( i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

<B) CLONE: s754 B 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
GCAGGAAGCA GGAATGGAAC 
(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: S975 A 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

TAGGAGATGG ATTATTGGTG 2 0 

5 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 

2Q (A) ORGANISM: Homo sapiens 

( vii ) IMMEDIATE SOURCE: 

(B) CLONE: s 9 75 B 

25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

AGGCAACTTT GCAATGAGTG - 2 0 

30 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 
35 (B) TYPE: nucleic acid * 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

40 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

45 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: td j 1474 A 


SO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CAGAGTGAGA CCTTGTCTCA AA 22 
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15 


20 


25 


30 


35 


40 


( 2 ) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE : nucleic acid 
(Cl STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

( vii) IMMEDIATE SOURCE: 

(B) CLONE: tdj!474 B 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
TTCTGCAAAC ACCTTAAACT CAG 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: tdj!239 A 


23 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
AACCTGGAAG GCAGAGGTTG 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) 

MOLECULE TYPE: DNA 

(genomic) 

(iii) 

HYPOTHETICAL: NO 


(vi) 

ORIGINAL SOURCE: 



(A) ORGANISM: Homo 

sapiens 

( vii ) 

IMMEDIATE SOURCE: 



(3) CLONE: tdj!239 B 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TCTGTACCTG CTAAGCAGTG G 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 111 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2.. Ill 


40 


45 


50 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

G GKC TTA CTC TGT TGT CCC AGC TGG AGT ACA GWG TGC GAT CAT GAG 
Xaa Leu Leu Cys Cys Pro Ser Trp Ser Thr Xaa Cys Asp His Glu 
1865 1870 1875 

GCT TAC TGT TGC TTG ACT CCT AGG CTC AAG CGA TCC TAT CAC CTC AGT 

Ala Tyr Cys Cys Leu Thr Pro Arg Leu Lys Arg Ser Tyr His Leu Ser 

1880 1885 1890 1895 

CTC CAA GTA GCT GGA CT 

Leu Gin Val Ala Gly 

1900 


46 


94 


111 
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15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 12 : 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY; linear 

< ii ) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 

Xaa Leu Leu Cys Cys Pro Ser Trp Ser Thr Xaa Cys Asp His Glu Ala 
1 5 10 15 

Tyr Cys Cys Leu Thr Pro Arg Leu Lys Arg Ser Tyr His Leu Ser Leu 
20 25 30 

Gin Val Ala Gly 
35 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1534 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


40 


45 


50 


55 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GAGGCTAGAG GGCAGGCACT TTATGGCAAA CTCAGGTAGA ATTCTTCCTC TTCCGTCTCT 
TTCCTTTTAC GTCATCGGGG AGACTGGGTG GCAATCGCAG CCCGAGAGAC GCATGGCTCT 
TTCTGCCCTC CATCCTCTGA TGTACCTTGA TTTCGTATTC TGAGAGGCTG CTGCTTAGCG 
GTAGCCCCTT GGTTTCCGTG GCAACGGAAA AGCGCGGGAA TTACAGATAA ATTAAAACTG 
CGACTGCGCG GCGTGAGCTC GCTGAGACTT CCTGGACCCC GCACCAGGCT GTGGGGTTTC 
TCAGATAACT GGGCCCCTGC GCTCAGGAGG CCTTCACCCT CTGCTCTGGG TAAAGGTAGT 
AGAGTCCCGG GAAAGGGACA GGGGGCCCAA GTGATGCTCT GGGGTACTGG CGTGGGAGAG 
TGGATTTCCG AAGCTGACAG ATGGGTATTC TTTGACGGGG GGTAGGGGCG GAACCTGAGA 


60 

120 

180 

240 

300 

360 

420 

480 
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GGCGTAAGGC GTTGTGAACC CTGGGGAGGG GGGCAGTTTG TAGGTCGCGA GGGAAGCGCT 

gaggatcagg aagggggcac tgagtgtccg tgggggaatc ctcgtgatag gaactggaat 600 

ATGCCTTGAG GGGGACACTA TGTCTTTAAA AACGTCGGCT GGTCATGAGG TCAGGAGTTC 660 

CAGACCAGCC TGACCAACGT ggtgaaactc cgtctctact AAAAATACNA AAATTAGCCG 720 

ggcgtggtgc cgctccagct actcaggagg ctgaggcagg agaatcgcta gaacccggga 780 

ggcggaggtt gcagtgagcc gagatcgcgc cattgcactc cagcctgggc gacagagcga 840 

gactgtctca aaacaaaaca aaacaaaaca aaacaaaaaa caccggctgg tatgtatgag _ 900 

aggatgggac cttgtggaag aagaggtgcc aggaatatgt ctgggaaggg gaggagacag 960 

gattttgtgg gagggagaac ttaagaactg gatccatttg cgccattgag aaagcgcaag 1020 

agggaagtag aggagcgtca gtagtaacag atgctgccgg cagggatgtg cttgaggagg 1080 

atccagagat gagagcaggt cactgggaaa ggttaggggc ggggaggcct tgattggtgt 1140 

TGGTTTGGTC GTTGTTGATT TTGGTTTTAT GCAAGAAAAA GAAAACAACC AGAAACATTG 1200 

gagaaagcta aggctaccac cacctacccg gtcagtcact cctctgtagc tttctctttc 1260 

ttggagaaag gaaaagaccc aaggggttgg cagcgatatg tgaaaaaatt cagaatttat 1320 

gttgtctaat tacaaaaagc aacttctaga atctttaaaa ataaaggacg ttgtcattag 1380 

ttcttctggt ttgtattatt ctaaaacctt ccaaatcttc aaatttactt tattttaaaa 1440 

TGATAAAATG AAGTTGTCAT TTTATAAACC TTTTAAAAAG ATATATATAT ATGTTTTTCT 1500 

1534 

aatgtgttaa agttcattgg aacagaaaga aa.g 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1924 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

(iv) ANTI “SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANIS Homo sapiens 
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(Xil SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
gaggctagag ggcaggcact ttatggcaaa ctcaggtaga attcttcctc ttccgtctct so 

TTCCTTTTAC GTCATCGGGG AGACTGGGTG G'CAATCGCAG CCCGAGAGAC GCATGGCTCT 12 0 

TTCTGCCCTC CATCCTCTGA TGTACCTTGA TTTCGTATTC TGAGAGGCTG CTGCTTAGCG 190 

GTAGCCCCTT GGTTTCCGTG GCAACGGAAA AGCGCGGGAA TTACAGATAA ATTAAAACTG 240 

CGACTGCGCG GCGTGAGCTC GCTGAGACTT CCTGGACCCC GCACCAGGCT GTGGGGTTTC 300 

TCAGATAACT GGGCCCCTGC GCTCAGGAGG CCTTCACCCT CTGCTCTGGG TAAAGGTAGT 360 

AGAGTCCCGG GAAAGGGACA GGGGGCCCAA GTGATGCTCT GGGGTACTGG CGTGGGAGAG • 420 

TGGATTTCCG AAGCTGACAG ATGGGTATTC TTTGACGGGG GGTAGGGGCG GAACCTGAGA 480 

GGCGTAAGGC GTTGTGAACC CTGGGGAGGG GGGCAGTTTG TAGGTCGCGA GGGAAGCGCT 540 

GAGGATCAGG AAGGGGGCAC TGAGTGTCCG TGGGGGAATC CTCGTGATAG GAACTGGAAT 6j0 

ATGCCTTGAG GGGGACACTA TGTCTTTAAA AACGTCGGCT GGTCATGAGG TCAGGAGTTC 6S0 

CAGACCAGCC TGACCAACGT GGTGAAACTC CGTCTCTACT AAAAATACNA AAATTAGCCG 720 

GGCGTGGTGC CGCTCCAGCT ACTCAGGAGG CTGAGGCAGG AGAATCGCTA GAACCCGGGA 780 

GGCGGAGGTT GCAGTGAGCC GAGATCGCGC CATTGCACTC CAGCCTGGGC GACAGAGCGA 840 

GACTGTCTCA AAACAAAACA AAACAAAACA AAACAAAAAA CACCGGCTGG 7ATGTATGAG 9u0 

AGGATGGGAC CTTGTGGAAG AAGAGGTGCC AGGAATATGT CTGGGAAGGG GAGGAGACAG 960 

GATTTTGTGG GAGGGAGAAC TTAAGAACTG GATCCATTTG CGCCATTGAG AAAGCGCAAG 1020 

AGGGAAGTAG AGGAGCGTCA GTAGTAACAG ATGCTGCCGG CAGGGATGTG CTTGAGGAGG 1080 

ATCCAGAGAT GAGAGCAGGT CACTGGGAAA GGTTAGGGGC GGGGAGGCCT TGATTGGTGT 1140 

TGGTTTGGTC GTTGTTGATT TTGGTTTTAT GCAAGAAAAA GAAAACAACC AGAAACATTG 1200 

GAGAAAGCTA AGGCTACCAC CACCTACCCG GTCAGTCACT CCTCTGTAGC TTTCTCTTTC 1260 

TTGGAGAAAG GAAAAGACCC AAGGGGTTGG CAGCGATATG TGAAAAAATT CAGAATTTAT 1320 

GTTGTCTAAT TACAAAAAGC AACTTCTAGA ATCTTTAAAA ATAAAGGACG TTGTCATTAG 1380 

TTCTTCTGGT TTGTATTATT CTAAAACCTT CCAAATCTTC AAATTTACTT TATTTTAAAA 1440 

TGATAAAATG AAGTTGTCAT TTTATAAACC TTTTAAAAAG ATATATATAT ATGTTTTTCT 1500 

AATGTGTTAA AGTTCATTGG AACAGAAAGA AATGGATTTA TCTGCTCTTC GCGTTGAAGA 1560 

AGTACAAAAT GTCATTAATG CTATGCAGAA AATCTTAGAG TGTCCCATCT GGTAAGTCAG 1620 
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25 


CACAAGAGTG TATTAATTTG GG ATT C CT AT GATTATCTCC T ATG CAAATG AACAGAATTG 
ACCTTACATA CTAGGGAAGA AAAGACATGT CTAGTAAGAT TAGGCTATTG TAATTGCTGA 
TTTTCTTAAC TGAAGAACTT TAAAAATATA GAAAATGATT CCTTGTTCTC CATCCACTCT 
GCCTCTCCCA CTCCTCTCCT TTTCAACACA ATCCTGTGGT CCGGGAAAGA CAGGGCTCTG 
TCTTGATTGG 7TCTGGACTG GGCAGGATCT GTTAGATACT GCATTTGCTT TCTCCAGCTC 
TAAA 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 631 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 


35 


40 


45 


55 


AAATGCTGAT 

GATAGTATAG 

AGTATTGAAG 

GGATCAATAT 

AATTCTGTTT 

TGATATCTGA 

AAGCTCACTG 

AAGGTAAGGA 

TCGTATTCTC 

TGCTGTATTC 

TCAGTTCCTG 

ACACAGCAGA 

CATTTAATAA 

ATATTGAACG 

AACTTGAGGC 

CTTATGTTGA 

CTCAGTCATA 

ACAGCTCAAA 

GTTGAACTTA 

TTCACTAAGA 

ATAGCTTTAT 

TTTTAAATAA 

ATTATTGAGC 

CTCATTTATT 

TTCTTTTTCT 

CCCCCCCCTA 

CCCTGCTAGT 

CTGGAGTTGA 

TCAAGGAACC 

TGTCTCCACA 

AAGTGTGACC 

ACATATTTTG 

CAAGTAAGTT 

TGAATGTGTT 

ATGTGGCTCC 

ATTATTAGCT 

TTTGTTTTTG 

TCCTTCATAA 

CCCAGGAAAC 

ACCTAACTTT 

ATAGAAGCTT 

TACTTTCTTC 

AATTAAGTGA 

GAACGAAAAT 

C CAACT C CAT 

TTCATTCTTT 

CTCAGAGAGT 

AT AT AG TT AT 

CAAAAGTTGG 

TTGTAATCAT 

AGTTCCTGGT 

AAAGTTTTGA 

CATATATTAT 

CTTTTTTTTT 

TTTTGAGACA 

AGTCTCGCTC 

TGTCGCCCAG 

GCTGGAGTGC 

AGTGGCATGA 

GGCTTGCTCA 

CTGCACCTCC 

GCCCCCGAGT 

TCAGCGACTC 

T 




1680 

1740 

1800 

1860 

1920 

1924 
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(2) INFORMATION FOR SEQ ID NO : 16 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


40 


45 


50 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

60 
120 
180 
240 
300 
360 
420 

480 

481 

(2) INFORMATION FOR SEQ ID NO: 17: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 522 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


TGAGATCTAG ACCACATGGT 
TTCAACAGCT ACTTTTTTTT 
GAGTACAGWG TGCGATCATG 
TCACCTCAGT CTCCAAGTAG 
GTGTTTTCTG TAGAGACGGG 
TAACCCGTCT GCCCACCTAG 
CCTGGCCAGT ATTTTAGTTA 
AAGTTTAGTT AACAACCTTA 
C 


CAAAGAGATA GAATGTGAGC 
TTTTTTTTTG AGACAGGGKC 
AGGCTTACTG TTGCTTGACT 
CTGGACTGTA AGTGCACACC 
GTTTCGCCAT GTTTCCCAGG 
GCATCCCAAA GTGCTAGGAT 
GCTCT3TCTT TTCAAGTCAT 
TATCATGTAT TCTTTTCTAG 


AATAAATGAA CCTTAAATTT 
TTACTCTGTT GTCCCAGCTG 
CCTAGGCTCA AGCGATCCTA 
ACCATATCCA GCTAAATTTT 
CTGGTCTTGA ACTTTGGGCT 
TACAGGTGTG AGTCATCATG 
ATACAAGTTC ATTTTCTTTT 
CATAAAGAAA GATTCGAGGC 
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25 


35 


(xi) SEQUENCE DESCRIPTION: SEQ ID N0:17: 
TGTGATCATA ACAGTAAGCC ATATGCATGT AAGTTCAGTT 
TAGTTTAGGT TTTTGCTTAT GCAGCATCCA AAAACAATTA 
CACCTGCCAT TACTTTTTAA AT3GCTCTTA AGGGCAGTTG 
TATTTGCCTT TTGAGTATTC TTTCTACAAA AGGAAGTAAA 
TTATAATTTA TAGATTTTGC ATGCTGAAAC TTCTCAACCA 
GTCCTTTATG TAAGAATGAT ATAACCAAAA GGTATATAAT 
GAAGCAACCA CAGTAGGAAA AAGTAGAAAT TATTTAATAA 
ATTCATCAGA AAAATTTATA AAAGAGTTTT TAGCACACAG 
TTTCCTGAAA GTTTTATGGG CATCTGCCTT ATACAGGTAT 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 465 base pairs 

(B) TYPE: nucleic acid 

(C) STRANEEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


TTCATAGATC 
GGAAACTATT 
TGAGATTATC 
TTAAATTGTT 
GAAGAAA GGG 
TTGGTAATGA 
CATAGCGTTC 
TAAATTATTT 
TG 


ATTGCTTATG 

GCTTGTAATT 

TTTTCATGGC 

CTTTCTTTCT 

CCTTCACAGT 

TGCTAGGTTG 

CTATAAAACC 

CCAAAGTTAT 


60 

120 

180 

240 

300 

360 

420 

480 

522 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 


45 


55 


GGTAGGCTTA 

AATGAATGAC 

AAAAAGTTAC 

TAAATCACTG 

CCATCACACG 

GTTTATACAG 

60 

ATGTCAATGA 

TGTATTGATT 

ATAGAGGTTT 

TCTACTGTTG 

CTGCATCTTA 

TTTTT ATTT G 

120 

TTTACATGTC 

TTTTCTTATT 

TTAGTGTCCT 

TAAAAGGTTG 

ATAATCACTT 

GCTGAGTGTG 

180 

TTTCTCAAAC 

AATTTAATTT 

CAGGAGCCTA 

CAAGAAAGTA 

CGAGATTTAG 

TCAACTTGTT 

240 

GAAGAGCTAT 

TGAAAATCAT 

TTGTGCTTTT 

CAGCTTGACA 

CAGGTTTGGA 

GTGTAAGTGT 

300 

TGAATATCCC 

AAGAATGACA 

CTCAAGTGCT 

GTCCATGAAA 

ACTCAGGAAG 

TTTGCACAAT 

360 

TACTTTCTAT 

GACGTGGTGA 

TAAGACCTTT 

TAGTCTAGGT 

TAATTTTAGT 

TCTGTATCTG 

420 
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w 


15 


20 


25 


30 


35 


40 


45 


50 


55 


TAATCTATTT TAAAAAATTA CTCCCACTGG TCTCACACCT TATTT 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AAAAAATCAC AGGTAACCTT AATGCATTGT CTTAACACAA CAAAGAGCAT ACATAGGGTT 
TCTCTTGGTT TCTTTGATTA TAATTCATAC ATTTTTCTCT AACTGCAAAC ATAATGTTTT 
CCCTTGTATT TTACAGATGC AAACAGCTAT AATTTTGCAA AAAAGGAAAA TAACTCTCCT 
GAACATCTAA AAGATGAAGT TTCTATCATC CAAAGTATGG GCTACAGAAA CCGTGCCAAA 
AGACTTCTAC AGAGTGAACC CGAAAATCCT TCCTTGGTAA AACCATTTGT TTTCTTCTTC 
TTCTTCTTCT TCTTTTCTTT TTTTTTTCTT TTTTTTTTTG AGATGGAGTC TTGCTCTGTG 
GCCCAGGCTA GAAGCAGTCC TCCTGCCTTA GCCNCCTTAG TAGCTGGGAT TACAGGCACG 
CGCACCATGC CAGGCTAATT TTTGTATTTT TAGTAGAGAC GGGGTTTCAT CATGTTGGCC 
AGGCTGGTCT CGAACTCCTA ACCTCAGGTG ATC 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6769 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 


465 
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(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 


ATGATGGAGA TCTTAAAAAG TAATCATTCT 
10 TCCCAGCACT TCGGGAGGCT GAGGCAGGCA 

CCTGGCCAAC ATGGTGAAAC CCATCTCTAC 

GCACGTACCT GTAATCCCAG CTACTCGGGA 

15 

GACGCGGAGG TTGCAGCGAG CCAAGATCGC 
GAGACTCTGT CTCAAAAAAG AAAAAAAAGT 
20 CCTGTAATCC CAGCACTTTG GGAGGCCAAG 

ACCAGCCTAG GCAATGTGGT GAAACCCCAT 

ATGGTGGCGT GCGCATGTAG TCCCAGCTCC 

25 

ACCCAGGAGA CAGAGGTTGC AGTGAACCGA 
CAGAACAAGA CTCTGTCTAA AAAAATACAA 
30 CATTCATTTT TCAAAAGATA TAGAGCTAAA 

TTTAAATACT CGTTCCTATA CTAAATGTTC 

TTATCCTTTT TAAAAATGTT ATTGGCCAGG 

35 

TTTGGGAGGC CGAGGCAGGC AGATCACCTG 
CATGGCGAAA CCTGTCTCTA CTAAAAATAC 
43 TGTAGTCCCA GCTACTCGGG AGGCTGAGGC 

GTTGCAGTGT GCCGAGATCA CGCCACTGCA 

CTCAAAAAAA AAAAACATAT ATACACATAT 

45 

TATATATATA TATTATATAT ATATATATAT 
AACATATATA TATGTAATAT ATATGTGATA 
50 TGATATATAT ATATACACAC ACACACACAT 

ACAAATTAGC CAGGCATAGT TGCACACGCT 

GAGGAGAATC TCTTGAACTT AGGAGGCGGA 

55 

ACTCCAGCCT GGGTGACAGA GCAGGACTCT 


GGGGCTGGGC 

GTAGTAGCTT 

GCACCTGTAA 

60 

GATAATTTGA 

GGTCAGGAGT 

TTGAGACCAG 

120 

TAAAAATACA 

AAAATTAGCT 

GGGTGTGGTG 

180 

GGCGGAGGCA 

CAAGAATTGC 

TTGAACCTAG 

240 

GCCACTGCAC 

TCCAGCCTGG 

GCCGTAGAGT 

300 

AATTGTTCTA 

GCTGGGCGCA 

GTGGCTCTTG 

360 

GCGGGTGGAT 

CTCGAGTCCT 

AGAGTTCAAG 

420 

CGCTACAAAA 

AATACAAAAA 

TTAGCCAGGC 

480 

TTGGGAGGCT 

GAGGTGGGAG 

GATCACTTGA 

540 

GATCACGCCA 

CCACGCTCCA 

GCCTGGGCAA 

600 

ATAAAATAAA 

AGTAGTTCTC 

ACAGTACCAG 

660 

AAGGAAGGAA AAAAAAAGTA 

ATGTTGGGCT 

720 

TTAGGAGTGC 

TGGGGTTTTA 

TTGTCATCAT 

780 

CACGGTGGCT 

CATGGCTGTA 

ATCCCAGCAC 

840 

AGGTCAGGAG 

TGTGAGACCA 

GCCTGGCCAA 

900 

AAAAATTAAC 

TAGGCGTGGT 

GGTGTACGCC 

960 

AGGAGAATCA 

ACTGAACCAG 

GGAGGTGGAG 

1020 

CTCTAGCCTG 

GCAACAGAGC 

AAGATTCTGT 

1080 

ATCCCAAAGT 

GCTGGGATTA 

CATATATATA 

1140 

ATATATGTGA 

TATATATGTG 

ATATATATAT 

1200 

TATATATAAT 

ATATATATGT 

AATATATATG 

1260 

ATATATGTAT 

GTGTGTGTAC 

ACACACACAC 

1320 

TGGTAGACCC 

AGCTACTCAG 

GAGGCTGAGG 

1380 

GGTTGCAGTG 

AGCTGAGATT 

GCGCCACTGC 

1440 

GTACACCCCC 

CAAAACAAAA AAAAAAGTTA 

1500 
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TCAGATGTGA TTGGAATGTA TATCAAGTAT 
aaaaattaca CAAATAATAC ATAATCAGGT 
WYCKAATCAC AMATATCCCA CACATTTTAT 
GGGTCTCACY CYKTTGCTWA TGCTGGTCTT 

ABCCTCCCAA RGTGCTGGGG ATWATAGGCA 

W 

TAGTGTGTAA ATTCCTGGGC ATTTTTTCAA 
GTCAATTTAT TTTGTCCATG GTGTCAAGTT 
CAACAATTGC TTGACTGTTC TTTACCATAC 
ACTCTCTAAC CTTGGAACTG TGAGAACTCT 
GACGTCTGTC TACATTGAAT TGGGTAAGGG 

20 

TGCTGGATTC CTTATCTTAT AGTTTTGCCA 
AGGCAGCTTT GGGAAGTGAA TTTTATGAGC 
CGCAGTTCCC ACCTTGAAGA ATCTTACTTT 
GGCTCACACC TGTAATCCCA GCACTTTGGG 

GGAGTTCGAG ACCAGCCTAG CCAACATGGA 

30 

TTAGCCAGGT GTGGTGGCAC ATAACTGTAA 
AATCACTTGA ACCCGGGAGG TGGAGGTTGC 
GCCTGGGCAA AAATAGCGAA ACTCCATCTA 
TCTGGTTTTA AMTMTGTGTA AATATGTTTT 
^ ACATGATGGA TTGCTACAGT ATTTAGTTCC 

40 

TTAAGAAGAG CTGAATTGCC AGGCGCAGTG 

GGCCGAGGTG GGCGGATCAC CTGAGGTCGG 

45 

AAACCCCATC TCTACTAAAA AAAAAAAAAA 
GTAATCCCAG CTACTCAGGA GGCTGAGGCA 
50 TTGCAGTGAG CCAAGATCGC ACCATTGCAC 

TCTCAAAAAA AAAAAAAAAG AGCTGAATCT 

CCTAACGCTT TGGAAGACCG AGGCAGAAGG 

55 

CTGGCCAACA TAGGGGAACC CTGTCTCTAT 


CAGCTTCAAA 

ATATGCTATA 

TTAATACTTC 

1560 

TTGAAAAATT 

TAAGACAACM 

SAARAAAAAA 

1620 

TATTMCTMCT 

MCWATTATTT 

TGWAGAGMCT 

1680 

TGAACYCCYK 

GCCYCAARCA 

RTCCTSCTCC 

1740 

TGARCTAACC 

GCACCCAGCC 

CCAGACATTT 

1800 

GGCATCATAC 

ATGTTAGCTG 

ACTGATGATG 

1860 

TCTCTTCAGG 

AGGAAAAGCA 

CAGAACTGGC 

• 

1920 

TGTTTAGCAG 

GAAACCAGTC 

TCAGTGTCCA 

1980 

GAGGACAAAG 

CAGCGGATAC 

AACCTCAAAA 

2040 

TCTCAGGTTT 

TTTAAGTATT 

TAATAATAAT 

2100 

AAAATCTTGG 

TCATAATTTG 

TATTTGTGGT 

2160 

CCTATGGTGA 

GT7ATAAAAA 

ATGTAAAAGA 

2220 

AAAAAGGGAG 

CAAAAGAGGC 

CAGGCATGGT 

2280 

AGGCCAAAGT 

GGGTGGATCA 

CCTGAGGTCG 

2340 

GAAACTCTGT 

CTGTACCAAA 

AAATAAAAAA 

2400 

TCCCAGCTAC 

TCGGGAGGCT 

GAGGCAGGAG 

2460 

GGTGAACCGA 

GATCGCACCA 

TTGCACTCCA 

2520 

AAAAAAAAAA 

AGAGAGCAAA 

AGAAAGAMTM 

2580 

TGGAAAGATG 

GAGAGTAGCA 

ATAAGAAAAA 

2640 

AAGATAAATT 

GTACTAGATG 

AGGAAGCCTT 

2700 

GCTCACGCCT 

GTAATCCCAG 

CACTTTGGGA 

2760 

GAGTTCAAGA 

CCAGCCTGAC 

CAACATGGAG 

2820 

AAAAATTAGC 

CGGGGTGGTG 

GCTTATGCCT 

2880 

GGAGAATCGC 

TTGAACCCAG 

GAAGCAGAGG 

2940 

TCCAGCCTAG 

GCAACAAGAG 

TGAAACTCCA 

3000 

TGGCTGGGCA 

GGATGGCTCG 

TGCCTGTAAT 

3060 

A7TGGTTGAG 

TCCACGAGTT 

TAAGACCAGC 

3120 

TTTTAAAATA 

ATAATACATT 

TTTGGCCGGT 

3180 
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15 


20 


25 


35 


40 


45 


55 


GCGGTGGCTC 

ATGCCTGTAA 

TCCCAATACT 

TTGGGAGGCT 

GAGGCAGGTA i 

GATCACCTGA 

3240 

GGTCAGAGTT 

CGAGACCAGC 

CTGGATAACC 

TGGTGAAACC 

CCTCTTTACT . 

AAAAATACAA 

3300 

AAAAAAAAAA 

AAATTAGCTG 

GGTGTGGTAG 

CACATGCTTG 

TAATCCCAGC 

TACTTGGGAG 

3360 

GCTGAGGCAG 

GAGAATCGCT 

TGAACCAG GG 

AGGCGGAGGT 

TACAATGAGC 

CAACACTACA 

3420 

CCACTGCACT 

CCAGCCTGGG 

CAATAGAGTG 

AG AC TG CATC 

TCAAAAAAAT 

AATAATTTTT 

3480 

AAAAATAATA 

AATTTTTTTA 

AGCTTATAAA 

AAGAAAAGTT 

GAGGCCAGCA 

TAGTAGCTCA 

3540 

CATCTGTAAT 

CTCAGCAGTG 

GCAGAGGATT 

GCTTGAAGCC 

AGGAGTTTGA 

GACCAGCCTG 

3600 

GGCAACATAG 

CAAGACCTCA 

TCTCTACAAA 

AAAATTTCTT 

TTTTAAATTA 

GCTGGGTGTG 

3660 

GTGGTGTGCA 

TCTGTAGTCC 

CAGCTACTCA 

GGAGGCAGAG 

GTGAGTGGAT 

ACATTGAACC 

3720 

CAGGAGTTTG 

AGGCTGTAGT 

GAGCTATGAT 

CATGCCACTG 

CACTCCAACC 

TGGGTGACAG 

3780 

AGCAAGACCT 

CCAAAAAAAA 

AAAAAAAAGA 

GCTGCTGAGC 

TCAGAATTCA 

AACTGGGCTC 

3840 

TCAAATTGGA 

TTTTCTTTTA 

GAATATATTT 

ATAATTAAAA 

AGGATAGCCA TCTTTTGAGC 

3900 

TCCCAGGCAC 

CACCATCTAT 

TTATCATAAC 

ACTTACTGTT 

TTCCCCCCTT 

ATGATCATAA 

3960 

ATTCCTAGAC 

AACAGGCATT 

GTAAAAATAG 

TTATAGTAGT 

TGATATTTAG 

GAGCACTTAA 

4020 

CTATATTCCA 

GGCACTATTG 

TGCTTTTCTT 

GTATAACTCA 

TTAGATGCTT 

GTCAGACCTC 

4080 

TGAGATTGTT 

CCTATTATAC 

TTATTTTACA GATGAGAAAA 

TTAAGGCACA 

GAGAAGTTAT 

4140 

GAAATTTTTC 

CAAGGTATTA 

AACCTAGTAA 

GTGGCTGAGC 

CATGATTCAA 

ACCTAGGAAG 

4200 

TTAGATGTCA 

GAGCCTGTGC 

TTTTTTTTTG 

TTTTTGTTTT 

TGTTTTCAGT 

AGAAACGGGG 

4260 

GTCTCACTTT 

GTTGGCCAGG 

CTGGTCTTGA 

ACTCCTAACC 

TCAAATAATC 

CACCCATCTC 

4320 

GGCCTCCTCA 

AGTGCTGGGA 

TTACAGGTGA 

GAGCCACTGT 

GCCTGGCGAA 

GCCCATGCCT 

4380 

TTAACCACTT 

CTCTGTATTA 

CATACTAGCT 

TAACTAGCAT 

TGTACCTGCC 

ACAGTAGATG 

4440 

CTCAGTAAAT 

ATTTCTAGTT 

GAATATCTGT 

TTTTCAACAA 

GTACATTTTT 

TTAACCCTTT 

4500 

TAATTAAGAA 

AACTTTTATT 

GATTTATTTT 

TTGGGGGGAA 

ATTTTTTAGG 

ATCTGATTCT 

4560 

TCTGAAGATA 

CCGTTAATAA 

GGCAACTTAT 

TGCAGGTGAG 

TCAAAGAGAA 

cctttgtcta 

4620 

TGAAGCTGGT 

ATTTTCCTAT 

TTAGTTAATA 

TTAAGGATTG 

ATGTTTCTCT 

CTTTTTAAAA 

4680 

ATATTTTAAC 

TTTTATTTTA 

, GGTTCAGGGA 

TGTATGTGCA 

G7TTGTTATA 

, TAGGTAAACA 

4740 

CACGACTTGG 

i GATTTGGTGT 

1 ATAGATTTTT 

' TTCATCATCC 

: gggtactaag 

CATACCCCAC 

4800 

AGTTTTTTGT 

' TTGCTTTCTT 

' TCTGAATTTC 

! TCCCTCTTCC 

: CACCTTCCTC 

: CCTCAAGTAG 

4860 
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GCTGGTGTTT CTCCAGACTA GAATCATGGT 
TAGTTCTCTC ATTTTATAGT GGAGGAAATA 
CACTGTCCAA AGGAATTTAG GATAACAGTA 
TGTTCTCTAA GTTCCTCATA TACAGTAATA 
ATGTTCAAGG ACTTCATTTT CAACTCTTTC 

10 

TCAAGCTTTG TCTGTATGTT ATATAATAAA 
XTCCTTAGGA ATTATTGCTT GACCCAGGTT 
TGCCCTGTTG CCAGGATGGA GTGTAGTGGC 
CCTGGTTCAA GCGATTCTCC TGTCTCAATC 
CACCACGCCC GGTTAATTGA CCATTCCATT 
TTGAGACAGA GTCTTGCTCT GTTGCCCAGG 

CGCAACGTCT GCCTCCCAGG TTGAAGCCAT 

25 

ACTACAGGCG CGCGCCACCA CACCCGGCTA 
CACCATGTTG GCCAGGCTGG TCTTGAACTC 
3o TCCCAAAGTG CTGGAATTAC AGGCTTGAGC 

TAGAAGTTTC TAAAGGAGAo AGCAGCTTTC 

TAATCGAAAG AGCTAAAATG TTTGATCTTG 

35 

AGTGTTTCTT ATTAGGACTC TGT C T TT TCC 
AATCACCCCT CAAGGAACCA GGGATGAAAT 
40 CAAAGTTTGC CAACTTAACA GGCACTGAAA 

GATTATTCTG AAGACCATTT GGGACCTTTA 

GTATCATTCT CTGTCAAATG TCGTGGTATG 

45 

TGTACCTATA AT AAGAC CTT CTTGTAACTG 
GTTTGTTTGT TTTTTTTTGA GATGGGGTCT 
50 TGCAATCTTG GCTCACTGCA ACCTCCACCT 

CCTCCTGAGT AGCTGGGACT ACAGGCGCAT 

TT AT AG AG AT GGGGTTTCAC CATGTTACCG 

55 

GTCTGCCCAC TTCAGCCTCC CAAAGTGCTG 


ATTGGAAGAA 

AC CTT AG AG A 

TCATCTAGTT 

4920 

CCCTTTTTGT 

TTGTTGGATT 

TAGTTATTAG 

4980 

GAACTCTGCA 

CATGCTTGCT 

TCTAGCAGAT 

5040 

TTGACACAGC 

AGTAATTGTG 

ACTGATGAAA 

5100 

TTTCCTCTGT 

TCCTTATTTC 

CACATATCTC 

5160 

CTACAAGCAA 

CCCCAACTAT 

GTTACCTACC 

5220 

TTTTTTTTTT 

TTTTTTTGGA 

GACGGGGTCT 

5280 

GCCATCTCGG 

CTCACTGCAA 

TCTCCAACTC 

5340 

TCACGAGTAG 

CTGGGACTAC 

AGGTATACAC 

5400 

TCTTTCTTTC 

TCTCTTTTTT 

TTTTTTTTTT 

5460 

CTGGAGTACA 

GAGGTGTGAT 

CTCACCTCTC 

5520 

ACTCCTGCCT 

CAGCCTCTCT 

AGTAGCTGGG 

5580 

ATTTTTGTAT 

TTTTAGTAGA 

GATGGGGTTT 

5640 

ATGACCTCAA 

GTGGTCCACC 

CGCCTCAGCC 

5700 

CACCGTGCCC 

AGCAACCATT 

TCATTTCAAC 

5760 

ACTAACTAAA 

TAAGATTGGT 

CAGCTTTCTG 

5820 

GTCATTTGAC 

AGTTCTGCAT 

ACATGTAACT 

5880 

CTATAGTGTG 

GGAGATCAAG 

AATTGTTACA 

5940 

CAGTTTGGAT 

TCTGCAAAAA 

AGGGTAATGG 

6000 

AGAGAGTGGG 

TAGATACAGT 

ACTGTAATTA 

6060 

CAACCCACAA 

AATCTCTTGG 

CAGAGTTAGA 

6120 

GTCTGATAGA 

TTTAAATGGT 

ACTAGACTAA 

6180 

ATTGTTGCCC 

TTTCGCTTTT 

TTTTTTGTTT 

6240 

CACTCTGTTG 

CCCAGGCTGG 

AGTGCAGTGA 

6300 

CCAAAGGCTC 

AAGCTATCCT 

CCCACTTCAG 

6360 

GCCACCACAC 

CCGGTTAATT 

TTTTGTGGTT 

6420 

AGGCTGGTCT 

CAAACTCCTG 

GACTCAAGCA 

6480 

CAGTTACAGG 

CTTGAGCCAC 

TGTGCCTGGC 

6540 
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20 


CTGCCCTTTA CTTTTAATTG GTGTATTTGT GTTTCATCTT TTACCTACTG GTTTTTAAAT ■ 
A7AGGGAGTG GTAAGTCTGT AGATAGAACA GAGTATTAAG TAGACTTAAT GGCCAGTAAT 
CTTTAGAGTA CATCAGAACC AGTTTTCTGA TGGCCAATCT GCTTTTAATT CACTCTTAGA 
CGTTAGAGAA ATAGGTGTGG TTTCTGCATA GGGAAAATTC TGAAATTAA 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4249 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

< ii ) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 


25 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 


30 



GATCCTAAGT 

GG AAATAATC 

TAGGTAAATA 

GGAATTAAAT 

GAAAGAGTAT 

GAGCTACATC 


TTCAGTATAC 

TTGGTAGTTT 

ATGAGGTTAG 

TTTCTCTAAT 

ATAGCCAGTT 

GGTTGATTTC 

35 

CACCTCCAAG 

GTGTATGAAG 

TATGTATTTT 

TTTAATGACA 

ATTCAGTTTT 

TGAGTACCTT 


GTTATTTTTG 

TATATTTTCA 

GCTGCTTGTG 

AATTTTCTGA 

GACGGATGTA 

ACAAATACTG 

40 

AACATCATCA 

ACCCAGTAAT 

AATGATTTGA 

ACACCACTGA 

GAAGCGTGCA 

GCTGAGAGGC 


ATCCAGAAAA 

GTATCAGGGT 

AGTTCTGTTT 

CAAACTTGCA 

TGTGGAGCCA 

TGTGGCACAA 


ATACTCATGC 

CAGCTCATTA 

CAGCATGAGA 

ACAGCAGTTT 

ATTACTCACT 

AAAGACAGAA 

45 

TGAATGTAGA 

AAAGGCTGAA 

TTCTGTAATA 

AAAGCAAACA 

GCCTGGCTTA 

GCAAGGAGCC 


AACATAACAG 

ATGGGCTGGA 

AGTAAGGAAA 

CATGTAATGA 

TAGGCGGACT 

CCCAGCACAG 

SO 

AAAAAAAGGT 

AGATCTGAAT 

GCTGATCCCC 

TGTGTGAGAG 

AAAAGAATGG 

AATAAGCAGA 


AACTGCCATG 

CTCAGAGAAT 

CCTAGAGATA 

CTGAAGATGT 

TCCTTGGATA 

ACACTAAATA 


GCAGCATTCA 

GAAAGTTAAT 

GAGTGGTTTT 

CCAGAAGTGA 

TGAACTGTTA 

GGTTCTGATG 

55 

ACTCACATGA 

TGGGGAGTCT 

GAATCAAATG 

CCAAAGTAGC 

TGATGTATTG 

GACGTTCTAA 


6600 

6660 

6720 

6769 


60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 


89 


EP 0 705 902 A1 


A7GAGG7AC-A TGAATATTCT GGTTCTTCAG AGAAAA7AGA CTTACTGGCC AGTGATCCTC 
A7GAGGCT77 AATATGTAAA AGTGAAAGAG TTCACTCCAA ATCAGTAGAG AG7AA7ATTG 

aaggccaaat atttgggaaa acctatcgga AGAAGGCAAG cctccccaac TTAAGCCATG 

TAACTGAAAA TCTAATTATA GGAGCATTTG TTACTGAGCC ACAGATAATA CAAGAGCGTC 
CCCICACAAA 7AAATTAAAG CGTAAAAGGA GACCTACATC AGGCCTTCAT CCTGAGGATT 
TTATCAAGAA AGCAGATTTG GCAGTTCAAA AGACTCCTGA AATGATAAAT CAGGGAACTA 
ACCAAACGGA GCAGAATGGT CAAGTGATGA A7A7TAC7AA 7AG7GGTCA7 GAGAA7AAAA 
CAAAAGG7GA 7TC7A77CAG AA7GAGAAAA A7CCTAACCC AATAGAATCA C7CGAAAAAG 
AA7CTGCT7T CAAAACGAAA GC7GAACCTA TAAGCAGCAG TATAAGCAAT ATGGAACTCG 
AATTAAATA7 CCACAATTCA AAAGCACC7A AAAAGAATAG GCTGAGGAGG AAG7CTTCTA 
CCAGGCA7AT 7CATGCGC77 GAAC7AGT AG 7CAGTAGAAA 7CTAAGCCCA CCTAA77GTA 
C7GAATTGCA AATTGA7AGT 7GTTCTAGCA G7GAAGAGA7 AAAGAAAAAA AAGTACAACC 
AAA7GCCAGT CAGGCACAGC AGAAACCTAC AACTCATGGA AGG7AAAGAA CCTGCAACTG 
GAGCCAAGAA GAGTAACAAG CCAAA7GAAC AGACAAG7AA AAGACATGAC AGCGATACT7 
TCCCAGAGCT GAAG7TAACA AA7GCACCTG GTTCTTTTAC TAAGTGTTCA AATACCAGTG 
AAC77AAAGA ATT7G7CAA7 CC7AGCCTTC CAAGAGAAGA AAAAGAAGAG AACTAGAAAC 
AGT7AAAGTG TCTAA7AATG CTGAAGACCC CAAAGATCTC ATG7TAAGTG GAGAAAGGGT 
TT7GCAAAC7 GAAAGA7CTG 7AGAGAG7AG CAGTATTTCA TTGG7ACCTG G7ACTGA77A 
TGGCAC7CAG GAAAG7ATC7 CG77ACTGGA AGTTAGCACT CTAGGGAAGG CAAAAACAGA 
ACCAAATAAA TGTGTGAGTC AG7GTGCAGC ATTTGAAAAC CCCAAGGGAC TAATTCATGG 
T7GTTCCAAA GATAA7AGAA ATGACACAGA AGGCTT7AAG TATCCATTGG GACATGAAGT 
7AACCACAGT CGGGAAACAA GCATAGAAAT GGAAGAAAGT GAAC77GA7G CTCAGTATTT 
GCAGAATACA 7TCAAGGTTT CAAAGCGCCA G7CATTTGCT CCGTTTTCAA ATCCAGGAAA 
TGCAGAAGAG GAATGTGCAA CA7TCTCTGC CCACTCTGGG TCCTTAAAGA AACAAAG7CC 
AAAAG7CAC7 77TGAATG7G AACAAAAGGA AGAAAA7CAA GGAAAGAA7G AG7C7AA7A7 
CAAGCC7G7A CAGACAGT7A A7A7CAC7GC AGGCT77CC7 GTGG7TGG7C AGAAAGA7AA 
GCCAGTTGA7 AATGCCAAA7 GTAGTATCAA AGGAGGCTCT AGGTTTTG7C 7ATCATC7CA 
GTTCAGAGGC AACGAAACTG GACTCATTAC TCCAAA7AAA CA7GGACTTT 7ACAAAACCC 


840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 
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w 


15 


20 


25 


30 


35 


40 


45 


SO 


55 


ATATCGTATA CCACCACTTT TTCCCATCAA 
TCTGCTAGAG GAAAACTTTG AGGAACATTC 
GAACATTCCA AGTACAGTGA GCACAATTAG 
AGAAGCCAGC TCAAGCAATA TTAATGAAGT 
TATTAATGAA ATAGGTTCCA GTGATGAAAA 
GCC AAAATTG AATGCTATGC TTAGATTAGG 
TCTTCCTGGA AG7AATTGTA AGCATCCTGA 
TCAGACTGTT AATACAGATT TCTCTCCATA 
GGGAAGTAGT CATGCATCTC AGGTTTGTTC 
TGAAATAAA3 GAAGATACTA GTTTTGCTGA 
TAG CAAAAG C GTCCAGAAAG GAGAGCTTAG 
TTTGGCTCAG GGTTACCGAA GAGGGGCCAA 
TAGTGAGGAT GAAGAGCTTC CCTGCTTCCA 
ACCTTCTCAG TCTACTAGGC ATAGCACCGT 
GGAGAATTTA TTATCATTGA AGAATAGCTT 
AAAGGCATCT CAGGAACATC ACCTTAGTGA 
TTCACAGTGC AGTGAATTGG AAGACTTGAC 
GATTGGTTCT TCCAAACAAA TGAGGCATCA 
CAAGGAATTG GTTTCAGATG ATGAAGAAAG 
AGAGCAAAGC ATGGATTCAA ACTTAGGTAT 
TATTTATAGA AGTGAGCTAA ATGTTTATGC 
GTATAGTTAA AGGAACTGCT TCTTAAACTT 
AGAAAAAAGT CCTTCACACA GCTAGGACGT 
AATTACTGGT GGACTTACTT CTGGTTTCAT 
AAGGAATTTA ATCATTTTGT GTGACATGAA 
AAGACACAGC AAGTTGCAGC GTTTATAGTC 
ATTTAAGGTG AAGCAGCATC TGGGTGTGAG 
GGGCTATCCT CTCAGAGTGA CATTTTAACC 


GTCATTTGTT AAAACTAAAT GTAAGAAAAA 
AATGTCACCT GAAAGAGAAA TGGGAAATGA 
CCGTAATAAC ATTAGAGAAA ATGTTTTTAA 
AGGTTCCAGT ACTAATGAAG TGGGCTCCAG 
CATTCAAGCA GAACTAGGTA GAAACAGAGG 
GGTTTTGCAA CCTGAGGTCT ATAAACAAAG 
AATAAAAAAG CAAGAATATG AAGAAGTAGT 
TCTGATTTCA GATAACTTAG AACAGCCTAT * 
TGAGACACCT GATGACCTGT TAGATGATGG 
AAATGACATT AAGGAAAGTT CTGCTGTTTT 
CAGGAGTCCT AGCCCTTTCA CCCATACACA 
GAAATTAGAG TCCTCAGAAG AGAACTTATC 
ACACTTGTTA TTTGGTAAAG TAAACAATAT 
TGCTACCGAG TGTCTGTCTA AGAACACAGA 
AAATGACTGC AGTAACCAGG TAATATTGGC 
GGAAACAAAA TGTTCTGCTA GCTTGTTTTC 
TGCAAATACA AACACCCAGG ATCCTTTCTT 
GTCTGAAAGC CAGGGAGTTG GTCTGAGTGA 
AGGAACGGGC TTGGAAGAAA ATAATCAAGA 
TGGAACCAGG TTTTTGTGTT TGCCCCAGTC 
TTTTGGGGAG CACATTTTAC AAATTTCCAA 
GAAACATGTT CCTCCTAAGG TGCTTTTCAT 
CATCTTTGAC TGAATGAGCT TTAACATCCT 
TTTATAAAGC AAATCCCGGT GTCCCAAAGC 
AGTAAATCCA GTCCTGCCAA TGAGAAGAAA 
TGCTTTTACA TCTGAACCTC TGTTTTTGTT 
AGTGAAACAA GCGTCTCTGA AGACTGCTCA 
ACTCAGGTAA AAAGCGTGTG TGTGTGTGCA 


2520 

2580 

2640 

2700 

2760 

2B20 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3640 

3900 

3960 

4020 

4080 

4140 
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is 
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CATGCGTGTG TGTGGTGTCC TTTGCATTCA GTAGTATGTA TCCCACATTC TTAGGTTTGC 
TGACATCATC TCTTTGAATT AATGGCACAA TTGTTTGTGG TTCATTGTC 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7io base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


25 


30 


35 


40 


45 


50 


55 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 


NGNGAATGTA 

ATCCTAATAT 

TTCNCNCCNA 

CTTAAAAGA^ 

TACCACTCCA 

ANGGCATCNC 

AATACATCAA 

TCAATTGGGG 

AATTGGGATT 

TTCCCTCNCT 

AACATCANTG 

GAATAATTTC 

ATGGCATTAA 

TTGCATGAAT 

GTGGTTAGAT 

TAAAAGGTGT 

TCATGCTAGA 

ACTTGTAGTT 

CCATACTAGG 

TGATTTCAAT 

TCCTGTGCTA 

AAATTAATTT 

GTATGATATA 

TTNTCA7TTA 

ATGGAAAGCT 

TCTCAAAGTA 

TTTCATTTTC 

TTGGTACCAT 

TTATCGTTTT 

TGAAGCAGAG 

GG ATACCATG 

CAACATAACC 

TGATAAAGCT 

CCAGCAGGAA 

ATGGCTGAAC 

TAGAAGCTGT 

GTTAGAACAG 

CATGGGAGCC 

AGCCTTCTAA 

CAGCTACCCT 

TCCATCATAA 

GTGACTCTTC 

TGCCCTTGAG 

GACCTGCGAA 

ATCCAGAACA 

AAGCACATCA 

GAAAAAGGTG 

TGTATTGTTG 

GCCAAACACT 

GATATCTTAA 

GCAAAATTCT 

TTCCTTCCCC 

TTTATCTCCT 

TCTGAAGAGT 

AAGGACCTAG 

CTCCAACATT 

TTATGATCCT 

TGCTCAGCAC 

ATGGGTAATT 

ATGGAGCCTT 

GGTTCTTGTC 

CCTGCTCACA 

ACTAATATAC 

CAGTCAGAGG 

GACCCAAGGC 

AGTCATTCAT 

GTTGTCATCT 

GAGATACCTA 

CAACAAGTAG 

ATGCTATGGG 

GAGCCCATGG 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 


4200 

4249 


60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

710 
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(ii) MOLECULE TYPE: DNA (genomic) 

( iii ) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 


CCATTGGTGC 

TAGCATCTGT 

CTGTTGCATT 

GCTTGTGTTT 

ATAAAATTCT 

GCCTGATATA * 

60 

CTTGTTAAAA 

ACCAATTTGT 

GTATCATAGA 

TTGATGCTTT 

TGAAAAAAAT 

CAGTATTCTA 

120 

ACCTGAATTA 

TCACTATCAG 

AACAAAGCAG 

TAAAGTAGAT 

TTGTTTTCTC 

ATTCCATTTA 

180 

AAGCAGTATT 

AACTTCACAG 

AAAAGT AGTG 

AATACCCTAT 

AAGCCAGAAT 

CCAGAAGGCC 

240 

TTTCTGCTGA 

CAAGTTTGAG 

GTGTCTGCAG 

ATAGTTCTAC 

CAGTAAAAAT 

AAAGAACCAG 

300 

GAGTGGAAAG 

GTAAGAAACA 

TCAATGTAAA 

GATGCTGTGG 

TATCTGACAT 

CTTTATTTAT 

360 

ATTGAACTCT 

GATTGTTAAT 

TTTTTTCACC 

ATACTTTCTC 

CAGTTTTTTT 

GCATACAGGC 

420 

ATTTATACAC 

TTTTATTGCT 

CTAGGATACT 

TCTTTTGTTT 

AATCCTATAT 

AGG 

473 


30 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 421 base pairs 
3S (B) TYPE: nucleic acid 

(C) STRANDEENESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
40 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4S (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION : SEQ ID NO:24: 

GGATAAGNTC AAGAGATATT TTGATAGGTG ATGCAGTGAT NAATTGNGAA AATTTNCTGC 60 
CTGCTTTTAA TCTTCCCCCG TTCTTTCTTC CTNCCTCCCT CCCTTCCTNC CTCCCGTCCT 120 
TNCCTTTCCT TTCCCTCCCT TCCNCCTTCT TTCCNTCTNT CTTTCCTTTC TTTCCTGTCT 180 


93 
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20 


25 


ACCTTTCTTT CCTTCCTCCC TTCCTTTTCT TTTCTTTCTT TCCTTTCCTT TTCTTTCCTT . 
TCTTTCCTTT CCTTTCTTTC TTGACAGAGT CTTGCTCTGT CACTCAGGCT GGAGTGCAGT 
GGCGTGATCT CGNCTCACTG CAACCTCTGT CTCCCAGGTT CAAGCAATTT TCCTGCCTCA 
GCCTCCCGA G TAGCTGAGAT TACAGGCGCC AGCCACCACA CCCAGCTACT GACCTGCTTT 
T 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 997 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


35 


40 


45 


55 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

AAACAGCTGG GAGATATGGT GCCTCAGACC AACCCCATGT TATATGTCAA CCCTGACATA 
TTGGCAGGCA ACATGAATCC AGACTTCTAG GCTGTCATGC GGGCTCTTTT TTGCCAGTCA 
TTTCTGATCT CTCTGACATG AGCTGTTTCA TTTATGCTTT GGCTGCCCAG CAAGTATGAT 
TTGTCCTTTC ACAATTGGTG GCGATGGTTT TCTCCTTCCA TTTATCTTTC TAGGTCATCC 
CCTTCTAAAT GCCCATCATT AGATGATAGG TGGTACATGC ACAGTTGCTC TGGGAGTCTT 
CAGAATAGAA ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT GGAGGAGCAA 
CAGCTGGAAG AGTCTGGGCC ACACGATTTG ACGGAAACAT CTTACTTGCC AAGGCAAGAT 
CTAGGTAATA TTTCATCTGC TGTATTGGAA CAAACACTYT GATTTTACTC TGAATCCTAC 
ATAAAGATAT TCTGGTTAAC CAACTTTTAG ATGTACTAGT CTATCATGGA CACTTTTGTT 
ATACTTAATT AAGCCCACTT TAGAAAAATA GCTCAAGTGT TAATCAAGGT TTACTTGAAA 
ATTATTGAAA CTGTTAATCC ATCTATATTT TAATTAATGG TTTAACTAAT GATTTTGAGG 
ATGWGGGAGT CKTGGTGTAC TCTAMATGTA TTATTTCAGG CCAGGCATAG TGGCTCACGC 


240 

300 

360 

420 

421 


60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 
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CTGGTAATCC CAGTAYYCMR GAGCCCGAGG CAGGTGGAGC CAGCTGAGGT CAGGAGTTCA 
AGACCTGTCT TGGCCAACAT GGGNGAAACC CTGTCTTCTT CTTAAAAAAN ACAAAAAAAA 
TTAACTGGGT TGTGCTTAGG TGNATGCCCC GNATCCTAGT TNTTCTTGNG GGTTGAGGGA 
GGAGATCACN TTGGACCCCG GAGGGGNGGG TGGGGGNGAG CAGGNCAAAA CACNGACCCA 
GCTGGGGTGG AAGGGAAGCC CACTCNAAAA AANNTTN 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 639 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


35 


45 


(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

TTTTTAGGAA ACAAGCTACT TTGGATTTCC ACCAACACCT GTATTCATGT ACCCATTTTT 
CTCTTAACCT AACTTTATTG GTCTTTTTAA TTCTTAACAG AGACCAGAAC TTTGTAATTC 
AACATTCATC GTTGTGTAAA TTAAACTTCT CCCATTCCTT TCAGAGGGAA CCCCTTACCT 
GGAATCTGGA ATCAGCCTCT TCTCTGATGA CCCTGAATCT GATCCTTCTG AAGACAGAGC 
CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA AAGTTCCCCA 
ATTGAAAGTT GCAGAATCTG CCCAGAGTCC AGCTGCTGCT CATACTACTG ATACTGCTGG 
GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG CTTCAACAGA 
AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG AATTTGTGAG 
TGTATCCATA TGTATCTCCC TAATGACTAA GACTTAACAA CATTCTGGAA AGAGTTTTAT 
GTAGGTATTG TCAATTAATA ACCTAGAGGA AGAAATCTAG AAAACAATCA CAGTTCTGTG 
TAATTTAATT TCGATTACTA ATTTCTGAAA ATTTAGAAY 


55 


780 

840 

900 

960 

997 


60 

120 

1B0 

240 

300 

360 

420 
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540 
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639 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 922 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(lii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

15 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


40 


45 


50 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ncccnmcccc cnaatctcaa at«**=taa ccccccccc. accgakacnt cmthgckta 


gagantttaa TGGCCCNTTC tgaggnacan 
gttgtttntt GTTTGGTTAC ctccagcctg 
aaaaaa AAAA AAATCGACTT TAAATAGTTC 
acgtaggtaa ACATATGCCA tggtgggata 

ACTCATGATA ATGGAATATT TGATTTAATT 
ACACCACATC ACTTTAACTA ATCTAATTAC 
AGGTATACCA AGAACCTTTA CAGAATACCT 
AGGCACGGTG GCGCATGCCT GTAATCGCAG 
GAGATTAGGA GATCGAGACC ATCCTGGCCA 
TGGNAAAATT ANCTGGGTGT GGTCGCGTGC 
GGCAGGAGAA TCACTTGAAC CGGGGAAATG 
NCATTCCAGC CTGGCGACAG AGCAAGGCTC 
CAAATAAGAA TATTTGTTGA GCATAGCATG 
CTTTATGAAA GACAAATAAT AGTTTTGCTG 
ATTTGGAGTG TGGGCCAGGC AC 


AAGCTTAAGC CAGGNGACGT GGANCNATGN 
ggtgacagag CAAGACTCTG TCTAAAAAAA 
CAGGACACGT GTAGAACGTG CAGGATTGCT 
ACTAGTATTC TGAGCTGTGT GCTAGAGGTA 
TCAGATGCTC GTGTACAAGT TTGCCAGAAA 
TGAAGAGACT ACTCATGTTG TTATGAAAAC 
tgcatctgct GCATAAAACC ACATGAGGCG 
CACTTTGGGA GGCCGAGGCG GGCAGATCAC 
GCATGGTGAA ACCCCGTCTC TACT ANN AAA 

ncctgtagtc ccagctactc gtgaggctga 
gaggtttcag tgagcagaga tcatncccct 

CGTCNCCNAA AAAATAAAAA AAAACGTGAA 
gatgatagtc ttctaatagt CAATCAATTA 
CTTCCTTACC TCCTTTTGTT TTGGGTTAAG 


55 


60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

700 

840 

900 

922 
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(2) INFORMATION FOR SEQ ID NO: 28: 

{ 1) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 867 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

15 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

20 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

GATCTATAGC TAGCCTTGGC GTCTAGAAGA TGGGTGTTGA GAAGAGGGAG TGGAAAGATA 
25 TTTCCTCTGG TCTTAACTTC ATATCAGCCT CCCCTAGACT TCCAAATATC CATACCTGCT 

GGTTATAATT AGTGGTGTTT TCAGCCTCTG ATTCTGTCAC CAGGGGTTTT AGAATCATAA 

ATCCAGATTG ATCTTGGGAG TGTAAAAAAC TGAGGCTCTT TAGCTTCTTA GGACAGCACT 

30 

TCCTGATTTT GTTTTCAACT TCTAATCCTT TGAGTGTTTT TCATTCTGCA GATGCTGAGT 
TTGTGTGTGA ACGGACACTG AAATATTTTC TAGGAATTGC GGGAGGAAAA TGGGTAGTTA 
35 GCTATTTCTG TAAGTATAAT ACTATTTCTC CCCTCCTCCC TTTAACACCT CAGAATTGCA 

TTTTTACACC TAACATTTAA CACCTAAGGT TTTTGCTGAT GCTGAGTCTG AGTTACCAAA 

AGGTCTTTAA ATTGTAATAC TAAACTACTT TTATCTTTAA TATCACTTTG TTCAAGATAA 

40 

GCTGGTGATG CTGGGAAAAT GGGTCTCTTT TATAACTAAT AGGACCTAAT CTGCTCCTAG 
CAATGTTAGC ATATGAGCTA GGGATTTATT TAATAGTCGG CAGGAATCCA TGTGCARCAG 
45 NCAAACTTAT AATGTTTAAA TTAAACATCA ACTCTGTCTC CAGAAGGAAA CTGCTGCTAC 

AAGCCTTAT? AAAGGGCTGT GGCTTTAGAG GGAAGGACCT CTCCTCTGTC ATTCTTCCTG 

TGCTCTTTTG TGAATCGCTG ACCTCTCTAT CTCCGTGAAA AGAGCACGTT CTTCTGCTGT 

SO 

ATGTAACCTG TCTTTTCTAT GATCTCT 

55 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

^ (A) LENGTH: 561 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

70 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

15 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 

( X i) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

NAAAAACGGG GNNGGGANTG GGCCTTAAAN CCAAAGGGCN AACTCCCCAA CCATTNAAAA 
25 ANTGACNGGG GATTATTAAA ANCGGCGGGA AACATTTCAC NGCCCAACTA ATATTGTTAA 

ATTAAAACCA CCACCNCTGC NCCAAGGAGG GAAACTGCTG CTACAAGCCT TATTAAAGGG 

CTGTGGCTTT AGAGGGAAGG ACCTCTCCTC TGTCATTCTT CCTGTGCTCT TTTGTGAATC 

30 

GCTGACCTCT CTATGTCCGT GAAAAGAGCA CGTTCTTCGT CTGTATGTAA CCTGTCTTTT 
CTATGATCTC TTTAGGGGTG ACCCAGTCTA TTAAAGAAAG AAAAATGCTG AATGAGGTAA 
35 GTACTTGATG TTACAAACTA ACCAGAGATA TTCATTCAGT CATATAGTTA AAAATGTATT 

TGCTTCCTTC CATCAATGCA CCACTTTCCT TAACAATGCA CAAATTTTCC ATGATAATGA 

GGATCATCAA GAATTATGCA GGCCTGCACT GTGGCTCATA CCTATAATCC CAGCGCTTTG 

40 

GGAGGCTGAG GCGCTTGGAT C 
(2) INFORMATION FOR SEQ ID NO: 30: 

45 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 567 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
50 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
55 (iv) ANTI -SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


40 


45 


GGTCTAGATC 

AGTGCTAGGA 

TCTGCTCCAC 

TTTCTTTCAG 

TCCAAAGCGA 

AAAATCTCAC 

TGTAAGACTT 

TCTTCCATCC 

GATTTAGAGA 


TGGTGTCGAA 

TTACAGGGGT 

TTCCATTGAA 

CATGATTTTG 

GCAAGAGAAT 

CCCACCACTC 

ATTACATACA 

CAACCACATA 

GGCTGTGTAA 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
AATTTTTTGT ATTTTTAGTA GAG AT GAGGT TCACCATGTT 
CGTCCTGACC TCAAGTGATC TGCCAGCCTC AGTCTCCCAA 
GAGCCACTGC GCCTGGCCTG AATGCCTAAA ATATGACGTG 
GGAAGCTTCT CTTTCTCTTA TCCTGATGGG TTGTGTTTGG 
AAGTCAGAGG AGATGTGGTC AATGGAAGAA ACCACCAAGG 
CCCAGGACAG AAAGGTAAAG CTCCCTCCCT CAAGTTGACA 
TGTATTCCAC TCCCCTTTGC AGAGATGGGC CGCTTCATTT 
TACACAGTGC TAGATACTTT CACACAGGTT CTTTTTTCAC 
AATAAGTATT GTCTCTACTT TATGAATGAT AAAACTAAGA 
TTTGGATTCC CGTCTCGGGT TCAGATC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 633 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TTGGCCTGAT TGGTGACAAA AGTGAGATGC TCAGTCCTTG AATGACAAAG AATGCCTGTA 
GAGTTGCAGG TCCAACTACA TATGCACTTC AAGAAGATCT TCTGAAATCT AGTAGTGTTC 
T3GACATTGG ACTGCTTGTC CCTGGGAAGT AGCAGCAGAA ATGATCGGTG GTGAACAGAA 
GAAAAAGAAA AGCTCTTCCT TTTTGAAAGT CTGTTTTTTG AATAAAAGCC AATATTCTTT 


60 

120 
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60 

120 
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TATAACTAGA 

TTTTCCTTCT 

CTCCATTCCC 

CTGTCCCTCT 

CTCTTCCTCT 

CTTCTTCCAG 

300 

atcttcaggg 

GGCTAGAAAT 

CTGTTGCTAT 

GGGCCCTTCA 

CCAACATGCC 

CACAGGTAAG 

360 

AGCCTGGGAG 

AACCCCAGAG 

TTCCAGCACC 

AGCCTTTGTC 

TTACATAGTG 

GAGTATTATA 

420 

AGCAAGGTCC 

CACGATGGGG 

GTTCCTCAGA 

TTGCTGAAAT 

GTTCTAGAGG 

CTATTCTATT 

480 

TCTCTACCAC 

TCTCCAAACA 

AAACAGCACC 

TAAATGTTAT 

CCTATGGCAA 

AAAAAAACTA 

540 

TACCTTGTCC 

CCCTTCTCAA 

GAGCATGAAG 

GTGGTTAATA 

GTTAGGATTC 

AGTATGTTAT 

600 

GTGTTCAGAT 

GGCGTTGAGC 

TGCTGTTAGT 

GCC 



633 


(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


35 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 


45 


TTTGAGAGAC 

TATCAAACCT 

TATACCAAGT 

GGCCTTATGG 

AGACTGATAA 

CCAGAGTACA 

60 

TGGCATATCA 

GTGGCAAATT 

GACTTAAAAT 

CCATACCCCT 

ACTATTTTAA 

GACCATTGTC 

120 

CTTTGGAGCA 

GAGAGACAGA 

CTCTCCCATT 

GAGAGGTCTT 

GCTATAAGCC 

TTCATCCGGA 

180 

GAGTGTA GGG 

TAGAGGGCCT 

GGGTTAAGTA 

TGCAGATTAC 

TGCAGTGATT 

TTACATGTAA 

240 

ATGTCCATTT 

TAGATCAACT 

GGAATGGATG 

GTACAGCTGT 

GTGGTGCTTC 

TGTGGTGAAG 

300 

GAGCTTTCAT 

CATTCACCCT 

TGGCACAGTA 

AGTATTGGGT 

GCCCTGTCAG 

TGTGGGAGGA 

360 

CACAATATTC 

TCTCCTGTGA 

GCAAGACTGG 

CACCTGTCAG 

TCCCTATGGA 

TGCCCCTACT 

420 

GTAGCCTCAG 

AAGTCTTCTC 

TGCCCACATA 

CCTGTGCCAA 

AAGACTCCAT 


470 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 517 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 
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35 


40 


45 


50 


55 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 


GGTGGTACGT 

GTCTGTAGTT 

CCAGCTACTT 

GGGAGGCTGA 

GATGGAAGGA 

TTGCTTGAGC 

60 

CCAGGAGGCA 

GAGGTGGNAN 

NTTACGCTGA 

GATCACACCA 

CTGCACTCCA 

GCCTGGGTGA 

120 

CAGAGCAAGA 

CCCTGTCTCA 

AAAACAAACA 

AAAAAAATGA 

TGAAGTGACA 

GTTCCAGTAG 

180 

TCCTACTTTG 

ACACTTTGAA 

TGCTCTTTCC 

TTCCTGGGGA 

TCCAGGGTGT 

CCACCCAATT 

240 

GTGGTTGTGC 

AGCCAGATGC 

CTGGACAGAG 

GACAATGGCT 

TCCATGGTAA 

GGTGCCTCGC 

300 

ATGTACCTGT 

GCTATTAGTG 

GGGTCCTTGT 

GCATGGGTTT 

GGTTTATCAC 

TCATTACCTG 

360 

GTGCTTGAGT 

AGCACAGTTC 

TTGGCACATT 

TTTAAATATT 

TGTTGAATGA 

ATGGCTAAAA 

420 

TGTCTTTTTG 

ATGTTTTTAT 

TGTTATTTGT 

TTTATATTGT 

AAAAGTAATA 

CATGAACTGT 

480 

TTCCATGGGG 

TGGGAGTAAG 

ATATGAATGT 

TCATCAC 



517 


(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 


(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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15 


20 


25 


30 


35 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 


CAGTAATCCT 

NAGAACTCAT 

ACGACCGGGC 

CCCTGGAGTC 

GNTGNTTNGA 

GCCTAGTCCN 

60 

GGAGAATGAA 

TTGACACTAA 

TCTCTGCTTG 

TGTTCTCTGT 

CTCCAGCAAT 

TGGGCAGATG 

120 

TGTGAGGCAC 

CTGTGGTGAC 

CCGAGAGTGG 

GTGTTGGACA 

GTGTAGCACT 

CTACCAGTGC 

180 

CAGGAGCTGG 

ACACCTACCT 

GATACCCCAG 

ATCCCCCACA 

GCCACTACTG 

ACTGCAGCCA 

240 

GCCACAGGTA 

CAGAGCCACA 

GGACCCCAAG 

AATGAGCTTA 

CAAAGTGGCC 

TTTCCAGGCC 

300 

CTGGGAGCTC 

CTCTCACTCT 

TCAGTCCTTC 

TACTGTCCTG 

GCTACTAAAT 

ATTTTATGTA • 

360 

CATCAGCCTG 

AAAAGGACTT 

CTGGCTATGC 

AAGGGTCCCT 

TAAAGATTTT 

CTGCTTGAAG 

420 

TCTCCCTTGG AAAT 

(2) INFORMATION FOR SEQ ID NO: 35 




434 


( i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3C base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


40 


45 


50 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
GATAAATTAA AACTGCGACT GCGCGGCGTG 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 
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15 


20 


(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

gtagtagagt cccgggaaag ggacaggggg 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
ATATATATAT GTTTTTCTAA TGTGTTAAAG 
(2) INFORMATION FOR SEQ ID NO: 38:- 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GTAAGTCAGC ACAAGAGTGT ATTAATTTGG 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 


103 
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( iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TTTCTTTTTC TCCCCCCCCT ACCCTGCTAG 30 

(2) INFORMATION FOR SEQ ID NO: 40: 

15 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


35 


40 


45 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GTAAGTTTGA ATGTGTTATG TGGCTCCATT 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


50 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AGCTACTTTT TTTTTTTTTT TTTGAGACAG 


30 


55 
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5 


10 


15 


( 2 ) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GTAAGTGCAC ACCACCATAT CCAGCTAAAT 
(2) INFORMATION FOR SEQ ID NO: 43: 

(l) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANTEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


40 


45 


50 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
AATTGTTCTT TCTTTCTTTA TAATTTATAG 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 
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10 


15 


20 


{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
GTATATAATT TGGTAATGAT GCTAGGTTGG 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

GAGTGTGTTT CTCAAACAAT TTAATTTCAG 30 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

45 

GTAAGTGTTG AATATCCCAA GAATGACACT 3 0 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM : Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
AAACATAATG TTTTCCCTTG TATTTTACAG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


35 


40 


45 


50 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43; 
GTAAAACCAT TTGTTTTCTT CTTCTTCTTC 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TGCTTGACTG TTCTTTACCA TACTGTTTAG 


30 


30 


30 


107 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
GTAAGGGTCT CAGGTTTTTT AAGTATTTAA 
(2) INFORMATION FDR SEQ ID NO: SI: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 
TGATTTATTT TTTGGGGGGA AATTTTTTAG 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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5 


W 


15 


20 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GTGAGTCAAA GAGAACCTTT GTCTATGAAG 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 


25 TCTTATTAGG ACTCTGTCTT TTCCCTATAG 

(2) INFORMATION FOR SEQ ID NO: 54: 


30 


35 


40 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid • 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


45 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GTAATGGCAA AGTTTGCCAA CTTAACAGGC 


(2) INFORMATION FOR SEQ ID NO: 55: 

50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
55 


30 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


io 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GAGTACCTTG TTATTTTTGT ATATTTTCAG 


IS 


20 


25 


30 


35 


40 


45 


50 


(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
GTATTGGAAC CAGGTTTTTG TGTTTGCCCC 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
ACATCTGAAC CTCTGTTTTT GTTATTTAAG 


30 
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(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ill) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

15 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
AGGTAAAAAG CGTGTGTGTG TGTGCACATG 


(2) INFORMATION FOR SEQ ID NO: 59: 


25 

(i) 

SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 

(ii) 

MOLECULE TYPE: DNA (genomic) 


(iii) 

HYPOTHETICAL: NO 

35 

(vi) 

ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


40 


45 


SO 


55 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
CATTTTCTTG GTACCATTTA TCGTTTTTGA 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii ) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60 


GTGTGTATTG TTGGCCAAAC ACTGATATCT 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
AGTAGATTTG TTTTCTCATT CCATTTAAAG 
(2) INFORMATION FOR SEQ ID NO: 62: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 
GTAAGAAACA TCAATGTAAA GATGCTGTGG 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 

ATGGTTTTCT CC7TCCATTT ATCTTTCTAG 30 

IS (2) INFORMATION FOR SEQ ID NO: 64: * 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nuclei c acid 

20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

25 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 

GTAATATTTC ATCTGCTGTA TTGGAACAAA 3 0 

25 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 
*0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

45 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 5 : 

TGTAAATTAA ACTTCTCCCA TTCCTTTCAG 30 
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i0 


15 


(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
GTGAGTGTAT CCATATGTAT CTCCCTAATG 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


3C 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

40 

ATGATAATGG AATATTTGAT TTAATTTCAG 
(2) INFORMATION FOR SEQ ID NO:6B: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

55 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


45 


50 


30 
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{xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

GTATACCAAG AACCTTTACA GAATACCTTG 30 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 

CTAATCCTTT GAGTGTTTTT CATTCTGCAG - 

25 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 30 base pairs 

30 

(B) TYPE: nucleic acid » 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

40 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

GTAAGTATAA TACTATTTCT CCCCTCCTCC 30 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
TGTAACCTGT CTTTTC7ATG ATCTCTTTAG 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 72: 

GTAAGTACTT GATGTTACAA ACTAACCAGA 

35 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 
^ (B) TYPE: nucleic acid 

40 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

45 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
TCCTGATGGG TTGTGTTTGG TTTCTTTCAG 


30 
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w 


15 


(2) INFORMATION FOR SEQ ID NO : 74 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 
(B } TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genome) 

( iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
GTAAAGCTCC CTCCCTCAAG TTGACAAAAA 
(2) INFORMATION FOR SEQ ID NO: 75: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


30 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

40 

CTGTCCCTCT CTCTTCCTCT CTTCTTCCAG 30 


(2) INFORMATION FOR SEQ ID NO: 76: 


45 

(l) 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



(D) TOPOLOGY: linear 


(ii) 

MOLECULE TYPE: DNA (genomic) 


(iii) 

HYPOTHETICAL: NO 

55 

(vi) 

ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


117 


EP 0 705 902 A1 


r? 


10 


15 


20 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
GTAAGAGCCT GGGAGAACCC CAGAGTTCCA 
(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii ) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
25 AGTGATTTTA CATGTAAATG TCCATTTTAG 


30 


35 


40 


(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


45 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GTAAGTATTG GGTGCCCTGT CAGTGTGGGA 


(2) INFORMATION FOR SEQ ID NO: 79: 

50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
55 
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(ii) MOLECULE TYPE: DNA (genomic) 

(ill) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


w 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
TTGAATGCTC TTTCCTTCCT GGGG ATCCAG 


15 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 80: 


GTAAGGTGCC TCGCATGTAC CTGTGCTATT 

35 

(2) INFORMATION FOR SEQ ID NO: 81: 


40 


45 


50 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: 
55 CTAATCTCTG CTTGTGTTCT CTGTCTCCAG 
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(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 42 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

( iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 


20 


25 


30 


35 


40 


(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: 


Cys 

Pro 

He 

Cys 

Leu 

Glu 

Leu 

lie Lys 

Glu 

Pro 

Val 

Ser 

Thr 

Lys 

Cys 

1 




5 




10 





15 


Asp 

His 

lie 

Phe 

Cys 

Lys 

Phe 

Cys Met 

Leu 

Lys 

Leu 

Leu 

Asn 

Gin 

Lys 




20 




25 





30 



Lys 

Gly 

Pro 

Ser 

Gin 

Cys 

Pro 

Leu Cys 

Lys 








35 40 


(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 


45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 



Cys 

Pro 

lie 

Cys 

Leu 

Glu 

Leu 

Leu 

Lys 

Glu 

Pro 

Val 

Ser 
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1 




5 





10 





15 


cn 

Asn 

His 

Ser 

Phe 

Cys 

Arg 
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Cys 

lie 

Thr 
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20 
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Arg 
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Thr 
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Gly Lys 

Gly Asn 

Cys 

Pro 

Val 

Cys 

Arg 
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40 


(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: peptide 
(iii ) HYPOTHETICAL: NO 


20 


25 


<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

Cys Pro He Cys Leu Asp Met Leu Lys Asn Thr Met Thr Thr Lys Glu 
15 10 15 

Cys Leu His Arg Phe Cys Ser Asp Cys lie Val Thr Ala Leu Arg Ser 
20 25 30 

Gly Asn Lys Glu Cys Pro Thr Cys Arg 
35 40 


30 


(2) INFORMATION FOR SEQ ID NO:85:- 

(i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide . 
(iii) HYPOTHETICAL: NO 


45 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

Cys Pro Val Cys Leu Gin Tyr Phe Ala Glu Pro Met Met Leu Asp Cys 
15 10 15 

Gly His Asn lie Cys Cys Ala Cys Leu Ala Arg Cys Trp Gly Thr Ala 
20 25 30 

Cys Thr Asn Val Ser Cys Pro Gin Cys Arg 
35 40 


55 Claims 


1 . An isolated nucleic acid coding for the BRCA1 polypeptide having the ammo acid sequence set forth in SEQ.ID.NO 2. 
or a modified form of said polypeptide which is functionally equivalent or associated with a predisposition to breast 
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or ovarsan cancer 

2 . An solated nucieic acid as claimed in claim 1 wmch is a DNA comprising the nucleotide sequence set forth >n 
SEQ ID NO 1 or a corresponding RNA 

5 

3. An isolated nucleic acid as claimed in claim 1 which :s a DNA comprising an allelic variant of the nucleotide sequence 
set forth in SEG ID NO 1 ora corresponding PNA 

4 . An .solated nucleic acid as claimed in claim 1 which is a DNA comprising a mutated form of the nucleotide seauence 

w set forth in SEQ ID NO i associated with a predisposition to breast or ovarian cancer or a corresponding RNA 

5 . An isolated nucleic acid as claimed in claim 4 wherein the mutation is a deletion mutation. 

6. An isolated nucleic acid as claimed in claim 4, wherein the mutation is a nonsense mutation. 

15 

7. An isolated nucleic acid as claimed in claim 4, wherein the mutation is an insertion mutation 

8. An isolated nucleic acid as claimed in claim 4, wherein the mutation is a missense mutation 

20 9 . An isolated nucleic acid as claimed in claim 4 which is a DNA comprising a nucleotide sequence selected from. 

(a) SEQ ID. No: 1 having T substituted for C at position 4056; 

(b) SEQ ID No: 1 having an extra C at nucleotide position 5385 
25 

(c) SEQ ID No 1 having G substituted for T at position 5443: and 

(d) SEQ ID No: 1 having 11 base pairs at nucleotide positions 189-199 deleted 

30 or a corresponding RNA 

10. An isolated nucleic acid as claimed in any one of claims 1 to 9 which is a DNA containing BRCA1 gene regulatory 
sequences 

35 11. An isolated DNA as claimed in claim 2 or claim 3 wherein the nucleotide sequence set forth in SEQ ID NO 1 or an 

allelic variant thereof is operably-linked to BRCA1 gene regulatory sequences having a mutation which in vivo inhibits 
or prevents expression of the BRCA1 polypeptide 

12. Use of an isolated nucleic acid having a portion of the nucleotide sequence of a nucleic acid as claimed in any one 

40 of claims 1 to 9 as a hybridization probe to detect in a sample (i) a DNA having a nucleotide sequence selected 

from the nucleotide sequence set forth in SEQ. ID NO 1 . allelic variants thereof and mutated forms thereof associated 
with predisposition to breast or ovarian cancer or (ii) an RNA corresponding to said DNA. 

13 . A nucleic acid probe suitable for a use as claimed tn claim 11 wherein the nucleotide sequence of said probe com- 

45 prises the DNA sequence set forth in SEQ. ID NO 1 from nucleotide position 3631 to 3930 a DNA probe sequence 

as set forth in Table 9 or an RNA corresponding thereto 

1 4. A nucleic acid suitable for a use as claimed tn claim 1 2 wherein the sequence of said probe is a portion of a nucleic 
acid as claimed in any one of claims 4 to 9 including a mutation compared to the nucleotide sequence set forth in 

so SEQ. I D NO. 1. 

15. An isolated nucleic acid having at least 15 contiguous nucleotides of a nucleic acid as claimed in any one of claims 
4 to 9 and including a mutation compared to the nucleotide sequence set forth in SEQ ID NO 1 

55 16 . A replicative cloning vector which comprises an isolated DNA according to any one of claims 1 toll and 1 3 to 15 

and a replicon operative in a host cell for said vector 

17 . An expression vector which comprises an isolated DNA according to any one of claims 1 to 9 wherein the coding 
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sequence for the 3RCA1 polypeptiae or modified form thereof s ODerabty-nnked to a promoter seauence capable 
of aireciing expression of said cooing seauence in nost cells for said vector 

18 . Host cells transformed with a vector as claimed m claim 16 or claim 17 

5 

19 . A method of producing a polypeptide which is the BRCAl polypeptide having the ammo acid sequence set forth m 
SEQ ID NO 2 or a modified form of said polypeptide as defined in claim i which comprises culturing host cells as 
claimed in claim 16 containing an expression vector encoding said polypeptide under conditions suitable for pro 
duction of said polypeptide and (n) recovering said polypeptide 

w 

20 . A method as claimed in claim 1 9 which further comprises labelling the recovered polypeptide 

21 . A preparation of human BRCAl polypeptide substantially free of other human proteins, said polypeptide having the 
amino acid sequence set forth in SEQ. ID NO 2. 

15 

22. A preparation of a BRCAl polypeptide substantially free of other human proteins, the amino acid sequence of said 
polypeptide having substantial sequence homology with the wild-type BRCAl polypeptide having the ammo acid 
sequence set forth in SEQ ID NO 2, and said polypeptide having substantially similar function as the wild-type 
BRCAl polypeptide 

20 

23 . A preparation of a polypeptide substantially free of other proteins said polypeptide being a mutated human BRCAl 
polypeptide obtainable by expression of a mutated form of the nucleotide sequence set forth in SEQ ID NO 1 asso- 
ciated in humans with predisposition for breast or ovarian cancer 

25 24 . A preparation of a polypeptide as claimed in claim 23 said polypeptide being encoded by a mutated form of SEQ ID 

No t as defined in claim 9. 

25 . A preparation as claimed in any one of claims 21 to 24 wherein said polypeptide is labelled 

30 26 . An antibody capable of specifically binding one or more polypeptides as claimed in any one of claims 21 to 24 

27 . An antibody as claimed in claim 26 which is a monoclonal antibody 

28 . A preparation of a polypeptide substantially free of other proteins, said polypeptide being an antigenic fragment of 

35 a polypeptide as defined in any one of claims 21 to 24 and which is suitable for use as an immunogen to obtain an 

antibody as claimed in claim 26. 

29 . A polypeptide as defined in any one of claims 21 to 24 and 28 in the form of a fusion protein. 

40 30 . Use of a polypeptide as defined in any one of claims 21 to 24, 28 and 29 as an immunogen for antibody production 

31 . A use as claimed in claim 30, wherein one or more antibodies produced are subsequently labelled or bound to a 

solid support 

45 32 . A pair of single-stranded oligonucleotide primers for determination of a nucleotide sequence of a BRCAl gene by 

a nucleic acid amplification reaction, the sequence of said primers being derived from human chromosome 1 7q and 
the use of said primers in a nucleic acid amplification reaction resulting in the synthesis of DNA and/or RNA corre- 
sponding to all or part of the sequence of the BRCAl gene. 

so 33 A pair of primers as claimed in claim 32 for determination of a!! or part of the sequence of the BRCAl gene having 

the nucleotide sequence set forth in SEQ ID NO 1 and/or a mutated form thereof associated in humans with pre- 
disposition to breast or ovarian cancer 

34 . A method for identifying a mutant BRCAl nucleotide sequence in a suspected mutant BRCAl allele which comprises 

55 comparing the nucleotide sequence of the suspected mutant BRCAl allele with the wild-type BRCAl nucleotide 

sequence wherein a difference between the suspected mutant and the wild-type sequence identifies a mutant 
BRCAl nucleotide sequence 
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35. A kit for detecting mutations in the BRCA1 gene resulting in susceptibility to breast ana ovarian cancers comprising 
at least one oligonucleotide primer specific for a BRCA1 gene mutation and instructions relating to detecting muta- 
tions in the BRCA1 gene 

5 36. A Kit for detecting mutations in the BRCAt gene resulting in susceptibility to breast and ovarian cancers comprising 

at least one allele-specific oligonucleotide probe for a BRCAt gene mutation and instructions relating to detecting 
mutations in the BRCA1 gene 

37. A nucleic acid selected from a wild-type BRCA1 gene nucleic acid, a nucleic acid substantially homologous and 

to having substantially similar function to said wnd-type BRCA1 gene nucleic acid and functionally equivalent portions 

thereof for use in gene therapy to supply a wild-type BRCA1 gene function or a BRCA1 function substantially similar 
to the wild type to a cell which has lost said gene function or has altered gene function by virtue of a mutation m the 
BRCA1 gene 

■5 38. A nucleic acid as claimed in claim 37 which contains the BRCA1 gene regulatory sequences 

39. A nucleic acid as claimed in claim 37 or claim 38 for use in gene therapy wherein said nucleic acid is incorporated 
into the genome of said cell. 

2Q 40. A molecule selected from a wild-type BRCA1 polypeptide, a polypeptide substantially homologous and having sub- 
stantially similar function to said wild-type BRCA1 polypeptide, functional portions thereof and molecules which 
mimic the function of said wild-type BRCA1 polypeptide for use in peptide therapy to supply a wild-type BRCA1 
gene function or a BRCA1 function substantially similar to the wild-type to a cell which has lost said gene function 
or has altered gene function by virtue of a mutation in the BRCA1 gene 

25 

41. A method for screening potential cancer therapeutics which comprises: combining (i) a BRCAt polypeptide binding 
partner, (ii) a BRCA1 polypeptide selected from the group consisting of a polypeptide having the amino acid 
sequence set forth in SEQ ID. NO 2 and a polypeptide having a portion of said ammo acid sequence which binds 
to said binding partner and (iti) a compound suspected of being a cancer therapeutic and determining the amount 

30 of binding of the BRCA1 polypeptide to its binding partner 

42. A method for screening potential cancer therapeutics which comprises combining a BRCA1 polypeptide binding 
partner and a compound suspected of being a cancer therapeutic and measuring the biological activity of the binding 
partner. 

25 

43. A method for screening potential cancer therapeutics which comprises growing a transformed eukaryotic host cel! 
containing an altered BRCA1 gene associated with a predisposition to cancer in the presence of a compound sus- 
pected of being a cancer therapeutic and determining the rate of growth of said host cell 

40 44. A method for screening potential cancer therapeutics which comprises administering a compound suspected of 

being a cancer therapeutic to a transgenic animal which carries in its genome an altered BRCAt gene associated 
with a predisposition to cancer from a second animal and determining the development or growth of a cancer lesion. 

45. A transgenic animal which carries an altered BRCAt allele 

45 

46. A transgenic animal as claimed in claim 45 wherein the altered BRCAt allele contains a deletion 

47. A transgenic animal as claimed in claim 45 wherein the altered BRCA1 allele contains a nonsense mutation 

so 48. A transgenic animal as claimed in claim 45 wherein the altered BRCA1 allele contains a frameshift mutation 

49. A transgenic animal as claimed in claim 45 wherein the altered BRCAt allele contains a missense mutation 

50. A transgenic animal as claimed in claim 45 wherein the altered BRCA1 allele is a disrupted allele 

55 
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1 gaggctagagggcaggcactttatggcaaactcaggtagaattcttcctcttccgtctct 
61 ttccttttacgtcatcggggagactgggtggcaatcgcagcccgagagacgcatggctct 
121 ttctgccctccatcctctgatgtaccttgatttcgtattctgagaggctgctgcttagcg 
181 gtagccccttggtttccgtggcaacggaaaagcgcgggaattacagataaattaaaactg 
241 cgactgcgcggcgtgAGCTCGCTGAGACTTCCTGGACCCCGCACCAGGCTGTGGGGTTTC 

3 01 TCAGATAACTGGGCCCCTGCGCTCAGGAGGCCTTCACCCTCTGCTCTGGGTAAAGgtagt 
361 agagtcccgggaaagggacagggggcccaagtgatgctctggggtactggcgtgggagag 
421 tggatttccgaagctgacagatgggtattctttgacggggggtaggggcggaacctgaga 

4 81 ggcgtaaggcgt tgtgaaccctggggaggggggcagtttgtaggtcgcgagggaagcgct 
541 gaggatcaggaagggggcactgagtgtccgtgggggaatcctcgtgataggaactggaat 
601 atgccttgagggggacactatgtctttaaaaacgtcggctggtcatgaggtcaggagttc 
661 cagaccagcctgaccaacgtggtgaaactccgtctctactaaaaatacmaaaattagccg 
721 ggcgtggtgccgctccagctactcaggaggctgaggcaggagaatcgctagaacccggga 
781 ggcggaggttgcagtgagccgagatcgcgccattgcactccagcctgggcgacagagcga 
841 gac tgtctcaaaacaaaacaaaacaaaacaaaacaaaaaacaccggctggtatgtatgag 
901 aggatgggaccttgtggaagaagaggtgccaggaatatgtctgggaaggggaggagacag 
961 gattttgtgggagggagaacttaagaactggatccatttgcgccattgagaaagcgcaag 

1021 agggaagtagaggagcgtcagtagtaacagatgctgccggcagggatgtgcttgaggagfj 
1081 atccagagatgagagcaggtcactgggaaaggttaggggcggggaggccttgattggtgt 
1141 tggtttggtcgttgttgattttggttttatgcaagaaaaagaaaacaaccagaaacattg 

12 01 gagaaagctaaggctaccaccacctacccggtcagtcactcctctgtagctttctctttc 
1261 ttggagaaaggaaaagacccaaggggttggcagcgatatgtgaaaaaattcagaatttat 

1321 gttgtctaattacaaaaagcaacttctagaatctttaaaaataaaggacgttgtcattag 

13 81 ttcttctggtttgtattattctaaaaccttccaaatcttcaaatttactttattttaaaa 
1441 tgataaaatgaagttgtcattttataaaccttttaaaaagatatatatatatgtttttct 
1501 aatgtgttaa a gTTCATTGGAACAGAAAGAAATGGATTT AT CTGCT CTTCGCGTTGAAGA 

15 61 AGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGgtaagt cag 
1621 cacaagagtgtattaatttgggattcctatgattatctcctatgcaaatgaacagaattg 

16 81 accttacatactagggaagaaaagacatgtctagtaagat taggctattgtaat tgctga 
1741 ttttcttaactgaagaactttaaaaatatagaaaatgattccttgttctccatccactct 
18 01 gcctctcccactcctctccttttcaacacaatcctgtggtccgggaaagacagggctctg 
18 61 tcttgattggttctgcactgggcaggatctgttagatactgcatttgctttctccagctc 
1921 taaavwvwwwwvaaatgctgatgatagtatagagtattgaagggatcaatataat 
1981 tctgttttgatatctgaaagctcactgaaggtaaggatcgtattctctgctgtattctca 
2041 gttcctgacacagcagacatttaataaatattgaacgaacttgaggccttatgttgactc 
2101 agtcataacagctcaaagttgaacttattcactaagaatagctttatttttaaataaatt 
2161 attgagcctcatttattttctttttctcccccccctaccctgctagTCTGGAGTTGATCA 
2221 AGGAACCTGTCTCCACAAAGTGTGACCACATATTTTGCAAgtaagtttgaatgtgttatg 
2281 tggctccattattagcttttgtttttgtccttcataacccaggaaacacctaactttata 
2 341 gaagctttactttcttcaattaagtgagaacgaaaatccaactccatttcattctttctc 

24 01 agagagtatatagttatcaaaagttggttgtaatcatagttcctggtaaagttttgacat 
2461 atattatctttttttttttttgagacaagtctcgctctgtcgcccaggctggagtgcagt 

2 521 ggcatgaggcttgctcactgcacctccgcccccgagttcagcgactctwwwvwvw 

25 81 vtgagatctagaccacatggtcaaagagatagaatgtgagcaataaatgaaccttaaatt 
2641 tttcaacagctactttttttttttttttttgagacagGGKCTTACTCTGTTGTCCCAGCT 

27 01 GGAGTACAGWGTGCG ATCATGAGGCTT ACTGTTG CTTGACTC CTAGGCTCAAGCG ATC CT 
2 761 ATCACCTCAGTCTCCAAGTAGCTGGACTgtaagtgcacaccaccatatccagctaaattt 
2 821 tgtgttttctgtagagacggggtttcgccatgtttcccaggctggtcttgaactttgggc 
2 8 81 ttaacccgtctgcccacctaggcatcccaaagtgctaggattacaggtgtgagtcatcat 
2 941 gcctggccagtattttagttagctctgtcttttcaagtcatatacaagttcattttcttt 
3001 taagtttagttaacaaccttatatcatgtattcttttctagcataaagaaagattcgagg 
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3 061 ccvvwwvwvwvtgtgatcataacagtaagccatatgcatgtaagttcagttttcat 
3121 agatcattgcttatgtagtttaggtttttgcttatgcagcatccaaaaacaattaggaaa 
3181 ctattgcttgtaattcacctgccattactttttaaatggctcttaagggcagttgtgaga 
3241 ttatcttttcatggctatttgccttttgagtattctttctacaaaaggaagtaaactaaa 
3 3 01 ttgttctttctttctttataatttatagATTTTGCATGCTGAAACTrCTCAACCAGAAGA 
3361 AAGGGCCTTCACAGTGTCCTTTATCTAAGAATGATATAACCAAAAGg t a tataatttggt 
34 21 aatgatgctaggttggaagcaaccacagtaggaaaaagtagaaattatttaataacatag 
34 81 cgttcctataaaaccattcatcagaaaaatttataaaagagtttttagcacacagtaaat 
3 541 tatttccaaagttattttcctgaaagttttatgggcatctgccttatacaggtattgwv 
3601 vwwwwvggtaggcttaaatgaatgacaaaaagttactaaatcactgccAtcacacg 
3661 gtttatacagatgtcaatgatgtattgattatagaggttttctactgttgctgcatctta 
3 721 tttttatttgtttacatgtcttttcttattttagtgtccttaaaaggttgataatcactt 
3781 gctgagtgtgtttctcaaacaatttaatttcagGAGCCTACAAGAAAGTACGAGATTTAG 
3 841 TCAACTTGTTGAAGAGCTATTGAAAATCATTTGTGCITTTCAGCTTGACACAGGTTTGGA 
3 901 GTgtaagtgttgaatatcccaagaatgacactcaagtgctgtccatgaaaactcaggaag 

3 961 tttgcacaattactttctatgacgtggtgataagaccttttagtctaggttaattttagt 

4 021 tctgtatctgtaatctattttaaaaaattactcccactggtctcacaccttatttvww 
4081 vwwwvaaaaaatcacaggtaaccttaatgcattgtcttaacacaacaaagagcatic 
4141 atagggtttctcttggtttctttgattataattcatacatttttctctaactgcaaacat 
4201 aatgttttcccttgtattttacagATGCAAACAGCTATAATTTTGCAAAAAAGGAAAATA 
4261 ACTCTCCTGAACATCTAAAAGATGAAGTTT CTATCATC CAAAGTATGGG CTACAG AAAC C 
4321 GTGCCAAAAGACTTCTACAGAGTGAACCCGAAAATCCTTCCTTGgtaaaaccatttgttt 
4 381 tcttcttcttcttcttcttcttttcttttttttttctttttttttttgagatggagtctt 
4441 gctctgtggcccaggctagaagcagtcctcctgccttagccnccttagtagctgggatta 
4 501 caggcacgcgcaccatgccaggctaatttttgtatttttagtagagacggggtttcatca 
4 561 tgttggccaggctggtctcgaactcctaacctcaggtgatcvvvvvvvvvvvvvatgatg 
46 21 gagatcttaaaaagtaatcattctggggctgggcgtagtagcttgcacctgtaatcccag 
46 81 cacttcgggaggctgaggcaggcagataatttgaggtcaggagtttgagaccagcctggc 
4741 caacatggtgaaacccatctctactaaaaatacaaaaattagctgggtgtggtggcacgt 
4 8 01 acctgtaatcccagctactcgggaggcggaggcacaagaattgcttgaacctaggacgcg 
48 61 gaggt tgcagcgagccaagatcgcgccactgcactccagcctgggccgtagagtgagact 
4921 ctgtctcaaaaaagaaaaaaaagtaattgttctagctgggcgcagtggctct tgcctgta 
4 981 atcccagcactttgggaggccaaggcgggtggatctcgagtcctagagttcaagaccagc 
5041 ctaggcaatgtggtgaaaccccatcgctacaaaaaatacaaaaattagccaggcatggtg 
5101 gcgtgcgcatgtagtcccagctccttgggaggctgaggtgggaggatcacttgaacccag 
5161 gagacagaggttgcagtgaaccgagatcacgccaccacgctccagcctgggcaacagaac 
5221 aagactctgtctaaaaaaatacaaataaaataaaagtagttctcacagtaccagcattca 
5281 tttttcaaaagatatagagctaaaaaggaaggaaaaaaaaagtaatgttgggcttttaaa 
5341 tactcgttcctatactaaatgttcttaggagtgctggggttttattgtcatcatttatcc 
54 01 tttttaaaaatgttattggccaggcacggtggctcatggctgtaatcccagcactttggg 
5461 aggccgaggcaggcagatcacctgaggtcaggagtgtgagaccagcctggccaacatggc 
5521 gaaacctgtctctactaaaaatacaaaaattaactaggcgtggtggtgtacgcctgtagt 
5581 cccagctactcgggaggctgaggcaggagaatcaactgaaccagggaggtggaggttgca 
5641 gtgtgccgagatcacgccactgcactctagcctggcaacagagcaagattctgtctcaaa 
5701 aaaaaaaaacatatatacacatatatcccaaagtgctgggattacatatatatatatata 
5761 tatatattatatatatatatatatatatatgtgatatatatgtgatatatatataacata 
5821 tatatatgtaatatatatgtgatatatatataatatatatatgtaatatatatgtgatat 
5881 atatatatacacacacacacacatatatatgtatgtgtgtgtacacacacacacacaaat 
5941 tagccaggcatagttgcacacgcttggtagacccagctactcaggaggctgagggaggag 
6001 aatctcttgaacttaggaggcggaggttgcagtgagctgagattgcgccactgcactcca 
6061 gcctgggtgacagagcaggactctgtacaccccccaaaacaaaaaaaaaagttatcagat 
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6121 gtgattggaatgtatatcaagtatcagcttcaaaatatgctatattaatacttcaaaaat 
6181 tacacaaataatacataatcaggtttgaaaaatttaagacaacmsaaraaaaaawycmaa 
6241 tcacamatatcccacacattttattattmctmctmcwattattttgwagagmctgggtct 
63 0 0 cacycykttgctwatgctggtctttgaacyccykgccycaarcartcctsctccaJbcctc 
63 61 ccaargtgctggggatwataggcatgarctaaccgcacccagccccagacattttagtgt 
6421 gtaaattcctgggcattttttcaaggcatcatacatgttagctgactgatgatggtcaat 
6481 ttattttgtccatggtgtcaagtttctcttcaggaggaaaagcacagaactggccaacaa 
6541 ttgc ttg actgttctttaccatactgtttagCAGGAAACCAGTCTCAGTGTCCAACTCTC 
6601 TAACCTTGGAACTGTGAGAACTCTGAGGACAAAGCAGCGGATAGAACCTCAAAAGACGTC 
6661 TGTCTACATTGAATTGGgtaagggtctcaggttttttaagtatttaataataattgctgg 
6721 attccttatcttatagttttgccaaaaatcttggtcataatttgtatttgtggtaggcag 
6781 ctttgggaagtgaattttatgagccctatggtgagttataaaaaatgtaaaagacgcagt 
6 841 tcccaccttgaagaatcttactttaaaaagggagcaaaagaggccaggcatggtggctca 
6 901 cacctgtaatcccagcactttgggaggccaaagtgggtggatcacctgaggtcgggagtt 
6961 cgagaccagcctagccaacatggagaaactctgtctgtaccaaaaaataaaaaattagcc 
7021 aggtgtggtggcacataactgtaatcccagctactcgggaggctgaggcaggagaatcac 
7081 ttgaacccgggaggtggaggttgcggtgaaccgagatcgcaccattgcactccagcctgg 
7141 gcaaaaatagcgaaactccatctaaaaaaaaaaaagagagcaaaagaaagamtmtctg^t 
7201 tttaamtmtgtgtaaatatgtttttggaaagatggagagtagcaataagaaaaaacatga 
7261 tggattgctacagtatttagttccaagataaattgtactagatgaggaagccttttaaga 
7321 agagctgaattgccaggcgcagtggctcacgcctgtaatcccagcactttgggaggccga 
7381 ggtgggcggatcacctgaggtcgggagttcaagaccagcctgaccaacatggagaaaccc 
7441 catctctactaaaaaaaaaaaaaaaaaaattagccggggtggtggcttatgcctgtaatc 
7501 ccagctactcaggaggctgaggcaggagaatcgcttgaacccaggaagcagaggttgcag 
7561 tgagccaagatcgcaccattgcactccagcctaggcaacaagagtgaaactccatctcaa 
7621 aaaaaaaaaaaaagagctgaatcttggctgggcaggatggctcgtgcctgtaatcctaac 
7681 gctttggaagaccgaggcagaaggattggttgagtccacgagtttaagaccagcctggcc 
7741 aacataggggaaccctgtctctatttttaaaataataatacatttttggccggtgcggtg 
7801 gctcatgcctgtaatcccaatactttgggaggctgaggcaggtagatcacctgaggtcag 
7861 agttcgagaccagcctggataacctggtgaaacccctctttactaaaaatacaaaaaaaa 
7921 aaaaaaattagctgggtgtggtagcacatgcttgtaatcccagctacttgggaggctgag 
7981 gcaggagaatcgcttgaaccagggaggcggaggttacaatgagccaacactacaccactg 
8041 cactccagcctgggcaatagagtgagactgcatctcaaaaaaataataatttttaaaaat 
8101 aataaatttttttaagcttataaaaagaaaagttgaggccagcatagtagctcacatctg 
8161 taatctcagcagtggcagaggattgcttgaagccaggagtttgagaccagcctgggcaac 
8221 atagcaagacctcatctctacaaaaaaatttcttttttaaattagctgggtgtggtggtg 
8281 tgcatctgtagtcccagctactcaggaggcagaggtgagtggatacattgaacccaggag 
8 341 tttgaggctgtagtgagctatgatcatgccactgcactccaacctgggtgacagagcaag 
8401 acctccaaaaaaaaaaaaaaaagagctgctgagctcagaattcaaactgggctctcaaat 
8461 tggattttcttttagaatatatttataattaaaaaggatagccatcttttgagctcccag 
8521 gcaccaccatctatttatcataacacttactgttttccccccttatgatcataaattcct 
8581 agacaacaggcattgtaaaaatagttatagtagttgatatttaggagcacttaactatat 
8641 tccaggcactattgtgcttttcttgtataactcattagatgcttgtcagacctctgagat 
8701 tgttcctattatacttattttacagatgagaaaattaaggcacagagaagttatgaaatt 
8761 tttccaaggtattaaacctagtaagtggctgagccatgattcaaacctaggaagttagat 
8 821 gtcagagcctgtgctttttttttgtttttgtttttgttttcagtagaaacgggggtctca 
88 81 ctttgttggccaggctggtcttgaactcctaacctcaaataatccacccatctcggcctc 
8 941 ctcaagtgctgggattacaggtgagagccactgtgcctggcgaagcccatgcctttaacc 
90 01 acttctctgtattacatactagcttaactagcattgtacctgccacagtagatgctcagt 
9061 aaatatttctagttgaatatctgtttttcaacaagtacatttttt£aacccttttaatta 
9121 agaaaacttttattgatttattttttggggggaaattttttagGATCTGATTCTTCTGAA 
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9181 GATACCGTTAATAAGGCAACTTATTGCAGgtgagtcaaagagaacctttgtctatgaagc 
9241 tggtattttcctatttagttaatattaaggattgatgtttctctctttttaaaaatattt 
93 01 taacttttattttaggttcagggatgtatgtgcagtttgttatataggtaaacacacgac 

93 61 ttgggatttggtgtatagatttttttcatcatccgggtactaagcataccccacagtttt 
9421 ttgtttgctttctttctgaatttctccctcttcccaccttcctccctcaagtaggctggt 

94 81 gtttctccagactagaatcatggtattggaagaaaccttagagatcatctagtttagttc 
9541 tctcattttatagtggaggaaataccctttttgtttgttggatttagttattagcactgt 

96 01 ccaaaggaatttaggataacagtagaactctgcacatgcttgcttctagcagattgttct 
9661 ctaagttcctcatatacagtaatattgacacagcagtaattgtgactgatgaaaatgttc 
9721 aaggacttcattttcaactctttctttcctctgttccttatttccacatatctctcaagc 

97 81 tttgtctgtatgttatataataaactacaagcaaccccaactatgttacctaccttcctt 
9841 aggaattattgcttgacccaggtttttttttttttttttttggagacggggtcttgccct 
9901 gttgccaggatggagtgtagtggcgccatctcggctcactgcaatctccaactccctggt 
9961 tcaagcgattctcctgtctcaatctcacgagtagctgggactacaggtatacaccaccac 

10021 gcccggttaattgaccattccatttctttctttctctcttttttttttttttttttgaga 
10081 cagagtcttgctctgttgcccaggctggagtacagaggtgtgatctcacctctccgcaac 
10141 gtctgcctcccaggttgaagccatactcctgcctcagcctctctagtagctgggactaca 
102 01 ggcgcgcgccaccacacccggctaatttttgtatttttagtagagatggggt-ttcaccat 

102 61 gttggccaggctggtcttgaactcatgacctcaagtggtccacccgcctcagcctcccaa 

103 21 agtgctggaattacaggcttgagccaccgtgcccagcaaccatttcatttcaactagaag 
10381 tttctaaaggagagagcagctttcactaactaaataagattggtcagctttctgtaatcg 
10441 aaagagctaaaatgtttgatcttggtcatttgacagttctgcatacatgtaactagtgtt 
105 01 tcttattaggactctgtcttttccctatagTGTGGGAGATCAAGAATTGTTACAAATCAC 
10 561 CCCTCAAGGAACCAGGGATGAAATCAGTTTGGATTCTGCAAAAAAGGgtaa tggcaaagt 
10621 ttgccaacttaacaggcictgaaaagagagtgggtagatacagtactgtaattagattat 
10681 tctgaagaccatttggg ^cctttacaacccacaaaatctcttggcagagttagagtatca 
10741 ttctctgtcaaatgtcgtggtatggtctgatagatttaaatggtactagactaatgtacc 
108 01 tataataagaccttcttgtaactgattgttgccctttcgcttttttttttgtttgtttgt 
10861 ttgtttttttttgagatggggtctcactctgttgcccaggctggagtgcagtgatgcaat 
10 921 cttggctcactgcaacctccacctccaaaggctcaagctatcctcccacttcagcctcct 
10981 gagtagctgggactacaggcgcatgccaccacacccggttaattttttgtggttttatag 
11041 agatggggtttcaccatgttaccgaggctggtctcaaactcctggactcaagcagtctgc 
11101 ccacttcagcctcccaaagtgctgcagttacaggcttgagccactgtgcctggcctgccc 
11161 tttacttttaattggtgtatttgtgtttcatcttttacctactggtttttaaatataggg 
11221 agtggtaagtctgtagatagaacagagtattaagtagacttaatggccagtaatctttag 
11281 agtacatcagaaccagttttctgatggccaatctgcttttaattcactcttagacgttag 
11341 agaaataggtgtggtttctgcatagggaaaattctgaaattaawwwwwwvgatc 
11401 ctaagtggaaataatctaggtaaataggaattaaatgaaagagtatgagctacatcttca 
11461 gtatacttggtagtttatgaggttagtttctctaatatagccagttggttgatttccacc 
11521 tccaaggtgtatgaagtatgtatttttttaatgacaattcagtttttgagtaccttgtta 

11581 tttttgtatattttcag CTGCTTGTGAATTTTCTGAGACGGATGT AACAAAT ACTGAACA 
11641 TCATCAACCCAGTAATAATGATTTGAACACCACTGAGAAGCGTGCAGCTGAGAGGCATCC 
11701 AGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATAC 
11761 TCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAA 
11821 TGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGC CAACA 
11881 T AACAGATGGG CTGGAAGTAAGGAAACATGT AATGATAGG CGGACTC CCAG CA CAG AAAA 

11941 aaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatggaataagcagaaact 

12001 G C CATG CTCAGAGAAT C CT AGAG ATACTGAAGATGTTC CTTGGATAACACT AAAT AGCAG 
12061 CATT CAGAAAG TT AATGAGTGGTTTTCCAGAAG TGATGAA CTGTT AGGTTCTGATGACT C 
12121 ACATGATGGGGAGTCTGAATCAAATGCCAAAGTAGCTGATGTATTGGACGTTCTAAATGA 
12181 GGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGTGATCCTCATGA 
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12 241 ggctttaatatgtaaaagtgaaagagttc^ctccaaatcagtagagagtaatattgaagg 

123 01 CCAAATATTTGGGAAAACCTATCGGAAGAAGGCAAGCCTCCCCAACTTAAGCCATGTAAC 

123 61 TGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAAGAGCGTCCCCT 
12421 CACAAAT AAATT AAAGCGTAAAAGGAGACCTACATCAGGCCTTCATCCTGAGGATTTT AT 

124 81 GAAGAAAGCAGATTTGGCAG1TCAAAAGACTCCTGAAATGATAAATCAGGGAACTAAGCA 
12 541 AACGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGGTCATGAGAATAAAACAAA 

126 01 AG GTGA TTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTCGAAAAAGAATC 
12661 TGCTTTCAAAACGAAAGCTGAACCTATAAGCAGCAGTATAAGCAATATGGAACTCGAATT 
12721 AAATATCCACAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGAAGTCTTCTACCAG 

127 81 GCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTGTACTGA 

128 41 ATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAAGTACAACCAAAT 

129 01 GCCAGTCAGGCACAGCAGAAACCTACAACTCATGGAAGGTAAAGAACCTGCAACTGGAGC 

12 961 CAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATfiACAGCGATACTTTCCC 

13 021 AGAGCTGAAGTTAAGAAATGCACCTGGITCTTTTACTAAGTGTTCAAATACCAGTGAACT 
13 081 TAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAACTAGAAACAGT 

13141 T AAAG TGT CT AAT AATG CTGAAGACCC CAAAGAT CTCATGTTAAG TGGAGAAAGGG TTTT 
13201 GCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCAITGGTACCTGGTACTGATTATGG 
13261 CACTCAGGAAAGTATCTCX3TTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACAGAACC 
13 321 AAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCATGGTlG 
13 381 TT CCAAAGAT AATAGAAATGACACAGAAGGCTTTAAGT ATC CATTGGGACATGAAGTTAA 
13441 CCACAGTCGGGAAACAAGCATAGAAATGGAAGAAAGTGAACTTGATGCTCAGTATTTGCA 
13 501 GAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCCGTTTTCAAATCCAGGAAATGC 
13 561 AGAAGAGGAATGTGCAACATTCTCTGCCCACTCTGGGTCCTTAAAGAAACAAAGTCCAAA 
13621 AG T CA CTTTTGAATG TG AACAAAAGGAAGAAAAT CAAGGAAAGAATGAGTCT AAT AT CAA 
13 681 GCCTGTACAGACAGTTAATATCACTGCAGGCTTTCCTCJTGGTTGGTCAGAAAGATAAGCC 
13 741 AGTTGATAATGCCAAATGTAGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTT 
13 8 01 CAGAGGCAACG AAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAACCCATA 
13 861 TCGTATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAATCT 
13 921 GCTAGAGGAAAACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAA 

13 981 CATT C CAAGTACAGTGAGCACAATTAGCCGTAATAACATTAGAGAAAATGTTTTTAAAGA 

14 041 AGCO^K^CAAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTAT 
14101 TAATGAAATAGGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAACAGAGGGCC 
14161 AAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCT 
14221 TCCTGGAAGTAATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAAGTAGTTCA 
14 2 B 1 GACTGTTAATACAGATTTCTCTCCATATCTGAlTrCAGATAACTTAGAACAGCCTATGGG 
14341 AAGTAGT CATGCATCTCAGGTTTGTTCTGAGACAC CTGATGACCTGTTAGATGATGGTGA 
14401 AATAAAGGAAGATACTAGTTTTGCTGAAAATGAGATTAAGGAAAGTTCTGCTGTTTTTAG 
14461 CAAAAGCGTCCAGAAAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTT 
14 521 GGCTCAGGGTTACCGAAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAG 
14 581 TGAGGATGAAGAGCTTCCCTGCTTCCAACACTTGTTATTTGGTAAAGTAAACAATATACC 
14641 TTCTCAGTCTACTAGGCATAGCACCGTTGCTACCGAGTGTCTGTCTAAGAACACAGAGGA 
14 701 GAATTTATT ATCATTGAAGAATAGCTT AAATGACTG CAGTAAC CAGGT AAT ATTGGCAAA 
14 761 GGCATCTGAGGAACATCACCTTAGTGAGGAAACAAAATGTTCTGCTAGCTTGTTTTCTTC 
14 8 21 ACAGTGCAGTGAA'irGGAAGACTTGACTGCAAATACAAACACCCAGGATCCTTTCTTGAT 
14 881 TGGTTCTTCCAAACAAATGAGGCATCAGTCTGAAAGCCAGGGAGTTGGTCTGAGTGACAA 

14 941 GGAATTGGTTTCAGATGATGAAGAAAGAGGAACGGGCTTGGAAGAAAATAATCAAGAAGA 

15 001 GCAAAGCATGGATTCAAACTTAGgtattggaaccaggtttttgtgtttgccccagtctat 
15061 ttatagaagtgagctaaatgtttatgcttttggggagcacattttacaaatttccaagta 
15121 tagttaaaggaactgcttcttaaacttgaaacatgttcctcctaaggtgcttttcataga 
15181 aaaaagtccttcacacagctaggacgtcatctttgactgaatgagctttaacatcctaat 
15241 tactggtggacttacttctggtttcattttataaagcaaatccfiggtgtcccaaagcaag 
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15301 gaatttaatcattttgtgtgacatgaaagtaaatccagtcctgccaatgagaagaaaaag 

153 61 acacagcaagttgcagcgtttatagtctgcttttacatctgaacctctgtttttgttatt 
15421 taagGTGAAGCAGCATCTGGGTGTGAGAGTGAAACAAGCGTCTCTGLAAGACTGCTCAGGG 

154 81 CTATCCTCTCAGAGTGACATTTTAACCACTCaggtaaaaagcgtgtgtgtgtgtgcacat 

15 541 gcgtgtgtgtggtgtcctttgcattcagtagtatgtatcccacattcttaggtttgctga 
156 01 catcatctctttgaatt^atggcacaattgtttgtggttcattgtcvvvvvvvvvvvvvn 
15661 gngaatgtaatcctaatatttcncnccnacttaaaagaataccactccaanggcatciica 
15721 atacatcaatcaattggggaattgggattttccctcnctaacatcantggaataatttca 
15781 tggcattaattgcatgaatgtggttagattaaaaggtgttcatgctagaacttgtagttc 
15841 catactaggtgatttcaattcctgtgctaaaattaatttgtatgatatattntcatttaa 
15901 tggaaagcttctcaaagtatttcattttcttggtaccatttatcgtttttgaAGCAGAGG 
15961 GATACCATGCAACATAACCTGATAAAGCTCCAGCAGGAAATGGCTGAACTAGAAGCTGTG 

16021 ttagaacagcatgggagccagccttctaacagctacccttccatcataagtgactcttct 

16 081 GCCCTTGAGGACCTGCGAAATCCAGAACAAAGCACATCAGAAAAAGgtgtgtattgttgg 
16141 ccaaacactgatatcttaagcaaaattctttccttcccctttatctccttctgaagagta 

162 01 aggacctagctccaacattttatgatccttgctcagcacatgggtaattatggagccttg 
16261 gttcttgtccctgctcacaactaatataccagtcagagggacccaaggcagtcattcatg 
16321 ttgtcatctgagatacctacaacaagtagatgctatggggagcccatggwvwvvwvv 

163 81 wccattggtgctagcatctgtctgttgcattgcttgtgtttataaaattctgcctgata 
16441 tacttgttaaaaaccaatttgtgtatcatagattgatgcttttgaaaaaaatcagtattc 
16501 taacctgaattatcactatcagaacaaagcagtaaagtagatttgttttctcattccatt 
16561 t aa agCAGTATTAACTTCACAGAAAAGTAGTGAATACCCTATAAGCCAGAATCCAGAAGG 
16621 C CTTT CTG CTGACAAGTTTGAGGTGTCTGCAGATAGTTCTAC CAGTAAAAATAAAGAACC 

16681 AGGAGTGGAAAGgtaagaaacatcaatgtaaagatgctgtggtatctgacatctttattt 
16741 atattgaactctgattgttaatttttttcaccatactttctccagtttttttgcatacag 
168 01 gcatttatacacttttattgctctaggatacttcttttgtttaatcctatataggwvw 
168 61 wwwwggataagntcaagagatattttgataggtgatgcagtgatnaattgngaaaa 
16921 tttnctgcctgcttttaatcttcccccgttctttcttcctncctccctcccttcctncct 
16 981 cccgtccttncctttcctttccctcccttccnccttctttccntctntctttcctttctt 
17041 tcctgtctacctttctttccttcctcccttccttttcttttctttctttcctttcctttt 
17101 ctttcctttctttcctttcctttctttcttgacagagtcttgctctgtcactcaggctgg 
17161 agtgcagtggcgtgatctcgnctcactgcaacctctgtctcccaggttcaagcaattttc 
17221 ctgcctcagcctcccgagtagctgagattacaggcgccagccaccacacccagctactga 
17281 cctgcttttwwvwwvwvaaacaqctqqqaqatatgqtcrcctcagaccaaccccat 
17341 gttatatgtcaaccctgacatattggcaggcaacatgaatccagacttctaggctgtcat 
174 01 gcgggctcttttttgccagtcatttctgatctctctgacatgagctgtttcatttatgct 
17461 ttggctgcccagfiaagtatgatttgtcctttcacaattggtggcgatggttttctccttc 
17521 catttatctt tctagGTCATCCCCTTCTAAATGCCCATCATTAGATGATAGGTGGTACAT 
17581 GCACAGTTGCTCTGGGAGTCTTCAGAATAGAAACTACCCATCTCAAGAGGAGCTCATTAA 
17641 GGTTGTTGATGTGGAGGAGCAACAGCTGGAAGAGTCTGGGCCACACGATTTGACGGAAAC 
17701 ATCTTACTTGCCAAGGCAAGATCTAGgtaatatttcatctgctgtattggaacaaacact 
17761 ytgattttactctgaatcctacataaagatattctggttaaccaacttttagatgtacta 
17821 gtctatcatggacacttttgttatacttaattaagcccactttagaaaaatagctcaagt 
17881 gttaatcaaggtttacttgaaaattattgaaactgttaatccatctatattttaattaat 
17941 ggtttaactaatgattttgaggatgvgggagtcktggtgtactctamatgtattatttca 
18001 ggccaggcatagtggctcacgcctggtaatcccagtayycmrgagcccgaggcaggtgga 
18061 gccagctgaggtcaggagttcaagacctgtcttggccaacatgggngaaaccctgtcttc 
18121 ttcttaaaaaanacaaaaaaaattaactgggttgtgcttaggtgnatgccccgnatccta 
18181 gttnttcttgngggttgagggaggagatcacnttggaccccggaggggngggtgggggng 
18241 agcaggncaaaacacngacccagctggggtggaagggaagcccactcnaaaaaannttnv 
183 01 vwvvvvvvvvvtttttaggaaacaagctactttggatttccaccaacacctgtattcat 
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actttgtaattcaacattcatcgttgtgtaaattaaacttctcccattcctttcagAGGG 
AACCCCTTACCTGGAATCTGGAATCAGCCTCTTCTCTGATGACCCTGAATCTGATCCTTC 
TGAAGACAGAGCCCCAGAGTCAGCI^TGTTGGCAACATACCATCTTCAACCTCTGCATT 

gaaagttccccaattgaaagttgcagaatctgcccagagtccagctgctgctcatactac 

tgatactgctgggtataatgcaatggaagaaagtgtgagcagggagaagccagaattgac 

agcttcaacagaaagggtcaacaaaagaatgtccatggtggtgtctggcctgaccccaga 

AGAATTTgtgagtgtatccatatgtatctccctaatgactaagacttaacaacattctgg 

aaagagttttatgtaggtattgtcaattaataacctagaggaagaaatctagaaaacaat 

cacagttctgtgtaatttaatttcgattactaatttctgaaaatttagaaywwwvw 

vwvncccnncccccnaatctgaaatgggggtaacccccccccaaccganacntgggtng 

cntagagantttaatggcccnttctgaggnacanaagcttaagccaggngacgtggancn 

atgngttgtttnttgtttggttacctccagcctgggtgacagagcaagactctgtctaaa 

aaaaaaaaaaaaaaaaatcgactttaaatagttccaggacacgtgtagaacgtgcaggat 

tgctacgtaggtaaacatatgccatggtgggataactagtattctgagctgtgtgctaga 

ggtaactcatgataatggaatatttgatttaatttcagATGCTCGTGTACAAGTTTGCCA 

GAAAACAC CACAT CACTTTAACTAATCTAAITACTGAAGAGACTACTCATGTTGTTATGA 

AAACAGgtataccaagaacctttacagaataccttgcatctgctgcataaaaccacatfla 
ggcgaggcacggtggcgcatgcctgtaatcgcagcactttgggaggccgaggcgggcaga 
tcacgagattaggagatcgagaccatcctggccagcatggtgaaaccccgtctctactan 
naaatggnaaaattanctgggtgtggtcgcgtgcncctgtagtcccagctactcgtgagg 
ctgaggcaggagaatcacttgaaccggggaaatggaggtttcagtgagcagagatcatnc 
ccctncattccagcctggcgacagagcaaggctccgtcnccnaaaaaataaaaaaaaacg 
tgaacaaataagaatatttgttgagcatagcatggatgatagtcttctaatagtcaatca 
attactttatgaaagacaaataatagttttgctgcttccttacctccttttgttttgggt 
taagatttggagtgtgggccaggcacvvvvvvvvvvvvvgatctatagctagccttggcg 
tctagaagatgggtgttgagaagagggagtggaaagatatttcctctggtcttaacttca 
tatcagcctcccctagacttccaaatatccatacctgctggttataattagtggtgtttt 
cagcctctgattctgtcaccaggggttttagaatcataaatccagattgatcttgggagt 
gtaaaaaactgaggctctttagcttcttaggacagcacttcctgattttgttttcaactt 
ctaa tcct ttgagtgtttttcattctgcagATGCTGAGTTTGTGTGTGAACGGACACTGA 
AATATTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTgtaagtataata 
ctatttctcccctcctccctttaacacctcagaattgcatttttacacctaacjBitttaac 
acctaaggtttttgctgatgctgagtctgagttaccaaaaggtctttaaattgtaatact 
aaactacttttatctttaatatcactttgttcaagataagctggtgatgctgggaaaatg 
ggtctcttttataactaataggacctaatctgctcctagcaatgttagcatatgagctag 
ggatttatttaatagtcggcaggaatccatgtgcarcagncaaacttataatgtttaaat 
taaacatcaactctgtctccagaaggaaactgctgctacaagccttattaaagggctgtg 
gctttagagggaaggacctctcctctgtcattcttcctgtgctcttttgtgaatcgctga 
cctctctatctccgtgaaaagagcacgttcttctgctgtatgtaacctgtcttttctatg 
a t( c c t v ^ n/v ^ vvwsn,rvr ^^ 

actccccaaccattnaaaaantgacnggggattattaaaancggcgggaaacatttcacn 

gcccaactaatattgttaaattaaaaccaccaccnctgcnccaaggagggaaactgctgc 

tacaagccttattaaagggctgtggctttagagggaaggacctctcctctgtcattcttc 

ctgtgctcttttgtgaatcgctgacctctctatgtccgtgaaaagagcacgttcttcgtc 

tgtatgtaacctgtcttttctatgatctctttagGGGTGACCCAGTCTATTAAAGAAAGA 

AAAATGCTGAATGAGgtaagtacttgatgttacaaactaaccagagatattcattcagtc 

atatagttaaaaatgtatttgcttccttccatcaatgcaccactttccttaacaatgcac 

aaattttccatgataatgaggatcatcaagaattatgcaggcctgcactgtggctcatac 

ctataatcccagcgctttgggaggctgaggcgcttggatcwvwvwwvwaattttt 

tgtatttttagtagagatgaggttcaccatgttggtctagatctggtgtcgaacgtcctg 
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21421 acctcaagtgatctgccagcctcagtctcccaaagtgctaggattacaggggtgagccac 
21481 tgcgcctggcctgaatgcctaaaatatgacgtgtctgctccacttccattgaaggaagct 
21541 tctctttctcttatcctgaCgggttgtgtttggtttctttcagCATGATTTTGAAGTCAG 
216 01 AGGAGATGTGGTCAATGOAAGAAACCACCAAGGTCCAAAGCGAGCAAGAGAATCCCAGGA 
21661 CAGAAAGgtaaagctccctccctcaagttgacaaaaatctcaccccaccactctgtattc 
21721 cactcccctttgcagagatgggccgcttcattttgtaagacttattacatacatacacag 
21781 tgctagatactttcacacaggttcttttttcactcttccatcccaaccacataaataagt 
21841 attgtctctactttatgaatgataaaactaagagatttagagaggctgtgtaatttggat 
21901 tcccgtctcgggttcagatc^/vvvvvvvvvvvvttggcctgattggtgacaaaagtgaga 
21961 tgctcagtccttgaatgacaaagaatgcctgtagagttgcaggtccaactacatatgcac 
22021 ttcaagaagatcttctgaaatctagtagtgttctggacattggactgcttgtccctggga 
22081 agtagcagcagaaatgatcggtggtgaacagaagaaaaagaaaagctcttcctttttgaa 
22141 agtctgttttttgaataaaagccaatattcttttataactagattttccttctctccatt 
222 01 cccctgtccctctctcttcctctcttcttCcagATCTTCAGGGGGCTAGAAATCTGTTGC 
22261 TATGGGCCCTTCACCAACATGCCCACAGgtaagagcctgggagaaccccagagttccagc 
22321 accagcctttgtcttacatagtggagtattataagcaaggtcccacgatgggggttcctc 
22381 agattgctgaaatgttctagaggctattctatttctctaccactctccaaacaaaacagc 
22441 acctaaatgttatcotatggcaaaaaaaaactataccttgtcccccttctcaagagcaCg 
2 2501 aaggtggttaatagttaggattcagtatgttatgtgttcagatggcgttgagctgctgtt 
22561 agtgccvvvvvvvvvvvvvtttgagagactatcaaaccttataccaagtggccttatgga 
22621 gactgataaccagagtacatggcatatcagtggcaaattgacttaaaatccatacccctA 
22681 ctattttaagaccattgtcctttggagcagagagacagactctcccattgagaggtcttg 
22741 ctataagccttcatccggagagtgtagggtagagggcctgggttaagtatgcagattact 
228 01 gcagtgattttacatgtaaatgtccattttagATCAACTGGAATGGATGGTACAGCTGTG 
228 61 TGGTGCTTCTGTGGTGAAGGAGCTTTCATCATTCACCCTTGGCACAgtaagtattgggtg 
22921 ccctgtcagtgtgggaggacacaatattctctcctgtgagcaagactggcacctgtcagt 
22981 ccctatggatgcccctactgtagcctcagaagtcttctctgcccacatacctgtgccaaa 
2 3 041 agactccatv^vvvv-vnrv-v-vvvvggtggtacgtgtctgtagttccagctacttgggaggct 
23101 gagatggaaggattgcttgagcccaggaggcagaggtggnannttacgctgagatcacac 
23161 cactgcactccagcctgggtgacagagcaagaccctgtctcaaaaacaaacaaaaaaaat 
23221 gatgaagtgacagttccagtagtcctactttgacactttgaatgctctttccttcctggg 

2 3 281 g a t c c agGG TGT CCACCCAATTGTGGTTGTG CAGCCAGATGCCTGGACAG AGGACAATGG 

2 3 341 CTTCCATGgtaaggtgcctcgcatgtacctgtgctattagtggggtccttgtgcatgggt 
23401 ttggtttatcactcattacctggtgcttgagtagcacagttcttggcacatttttaaata 
23461 tttgttgaatgaatggctaaaatgtctttttgatgtttttattgttatttgttttatatt 
23521 gtaaaagtaatacatgaactgtttccatggggtgggagtaagatatgaatgttcatcacv 
23581 virvvvnvrvvvv ^ ca g taatcc tnagaactcatacgaccgggcccctggagtcgntgnttn 
23641 gagcctagtccnggagaatgaattgacactaatctctgcttgtgttctctgtctccagCA 
2 3 701 ATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCA 
23 761 CTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTAC 
23821 TGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAAGTGG 
23 881 CCTTTCCAGGCCCTGGGAGCTCCTCTCACTCTTCAGTCCTTCTACTGTCCTGGCTACTAA 

23 941 ATATTTTATGTACATCAGCCTGAAAAGGACTTCTGGCTATGCAAGGGTCCCTTAAAGATT 

24 001 TTCTGCTTGAAGTCTCCCTTGGAAAT 


Figure 10H 
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